bioinfo-biols / CIRI-cookbook

Document for CIRI-series software
https://ciri-cookbook.readthedocs.io/en/latest/index.html
4 stars 1 forks source link

Annotation file #32

Open jilguero888 opened 6 months ago

jilguero888 commented 6 months ago

I am working with a non-model organism. CIRI-full breaks when I load an annotation file in GFF3 format (no GTF available). I have converted the GFF3 file into GTF with gffread, and CIRI-full also breaks. I am testing CIRI-full with the "test" data provided by the package and different annotation files, and I have this result:

1- with original data test (test_ref.fa, test_anno.gtf) it works. 2- with human GTF file (gencode.v45.annotation.gtf) downloaded from GENCODE, it works. 3- with human GFF3 file (gencode.v45.annotation.gff3) downloaded from GENCODE, it breaks. The standard output shows:

Loading Annotation... Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1 at Merge3.getmerge(Merge3.java:77) at CIRI_Full2.main(CIRI_Full2.java:280)

and the CIRIerror.log file reports:

Died at /opt/CIRI/CIRI-full_v2.0//bin/CIRI_v2.0.6/CIRI2.pl line 281, line 1000. Died at /opt/CIRI/CIRI-full_v2.0//bin/CIRI_v2.0.6/CIRI2.pl line 281, line 1000. Died at /opt/CIRI/CIRI-full_v2.0//bin/CIRI_v2.0.6/CIRI2.pl line 281, line 1000.

and the ciri_pipe_1.log reports:

The GFF file provided cannot be understood by CIRI! Please refer to manual for details of required GFF formats Fatal error. Aborted. No circRNA list found at designated directory! The GFF file provided cannot be understood by CIRI-AS! Please refer to manual for details of required GFF formats.

4- with human GTF file converted from the GFF3 file by gffread, it breaks, with similar outputs. The GTF file has the fields "gene_id" and "transcript_id" in the column 9, as detailed in the documentation.

I think there is a problem with the parsing of annotation formats at any (or several) levels of the pipeline. Any help?. Thank you.

jilguero888 commented 6 months ago

An additional comment that I am using CIRI-full_v2.0 and an updated CIRI_Full_v2.1.1.jar I found, but they both show the same problem when dealing with annotation files.

Kevinzjy commented 6 months ago

Hi @jilguero888 , could you try the modified version from https://github.com/bioinfo-biols/CIRIquant/blob/master/libs/CIRI2.pl ?

jilguero888 commented 6 months ago

Hi, thank you. This is a hint. When the new CIRI2.pl from github is used, it works. However, now the pipeline halts in the next steps. Here is the error output:

Loading Annotation... Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1 at Merge3.getmerge(Merge3.java:79) at CIRI_Full2.main(CIRI_Full2.java:280) ... 5 more

The "ciri_pipe_1.log" file outputs at the end:

The GFF file provided cannot be understood by CIRI-AS! Please refer to manual for details of required GFF formats.

When running the commands step by step, it stops in the CIRI_AS_v1.2.pl script. The error is:

The GFF file provided cannot be understood by CIRI-AS! Please refer to manual for details of required GFF formats.

I think CIRI_AS has also a problem with the annotation file. I have found a version CIRI_AS_v1.1.pl on Sourceforge, but it does not work, and I don't find more recent versions on GitHub.

Kevinzjy commented 6 months ago

@jilguero888 , I have included the update version of CIRI2 and CIRI-AS in https://github.com/bioinfo-biols/CIRI-full/tree/master/bin. Please give it a try.

jilguero888 commented 6 months ago

Hi, thank you, this has fixed the second problem and now, both CIRI2 and CIRI-AS work when running the independent Perl programs and both complete the output. However, when running the full pipeline with the updated file CIRI-Full_v2.1.2.jar, an exception is raised:

Error: LinkageError occurred while loading main class CIRI_Full2 java.lang.UnsupportedClassVersionError: CIRI_Full2 has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 55.0

My java version is 11:

openjdk version "11.0.22" 2024-01-16

I have tested the previous pipeline version CIRI_Full_v2.1.1.jar, which was compatible with my java, but it is not able to complete at the "merge" step:

... Merge module start

Loading Annotation... Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1 at Merge3.getmerge(Merge3.java:77) at CIRI_Full2.main(CIRI_Full2.java:280)

I think that the new CIRI-Full_v2.1.2.jar should be compiled to be compatible with other java versions, but this is rather a question for you.

jilguero888 commented 5 months ago

Hi, I am wondering whether CIRI-Full_v2.1.2.jar should be compiled to be compatible with more java versions, similar to CIRI_Full_v2.1.1.jar which is compatible, or whether the raised problem is due to another thing.

Also, the exception raised after the "Merge module start" when the full pipeline is run with CIRI_Full_v2.1.1.jar, is dependent on the input annotation format. With a GTF format, the pipeline works, but with a GFF3 format, the pipeline breaks:

... Merge module start

Loading Annotation... Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1 at Merge3.getmerge(Merge3.java:77) at CIRI_Full2.main(CIRI_Full2.java:280)

I think there is also a problem with the Merge module managing a GFF3 file that needs to be fixed (converted GTF files do not work with your software). The merge module seems to be essential to complete the final output (.anno file) to proceed with the graphical output. Thank you.

Kevinzjy commented 5 months ago

Hi, I am wondering whether CIRI-Full_v2.1.2.jar should be compiled to be compatible with more java versions, similar to CIRI_Full_v2.1.1.jar which is compatible, or whether the raised problem is due to another thing.

Hi @jilguero888, CIRI-Full only supports annotations in GTF format. Please convert your GFF3 file to GTF format. Besides, you need to update your jdk version to use the compiled binar (which is quite easy), or you can compile your own binary by running make in the cloned repository.

jilguero888 commented 5 months ago

Hi, thank you. CIRI-Full works when I use the new (updated) programs you have provided with GTF format from GENECODE, but I have converted the GFF3 file of my non-model organism into GTF format, and it works for other circular RNA programs, but it does not work with your software (I am trying to use different packages to compare). CIRI-Full is also dependent on the source of the GTF files, it seems to have problems with annotations (there are a couple of other issues on the CIRI-cookbook).

It is hard to understand why only GTF support is provided for CIRI-Full. The GFF3 format is the standard for genome annotations set by The Sequence Ontology Consortium, and it is the latest version. GTF format is equivalent to GFF2 (a previous version). Requesting users with a GFF3 file to downgrade their annotation file to make it compatible with your software makes no sense.

Kevinzjy commented 5 months ago

Hi @jilguero888, you need to check the keys and values in the attribute column to make sure your annotation file is in the same format as GENCODE/UCSC GTF to run CIRI full.

Besides, I agree that support for GFF3 could be a further improvement, but for most genomes, support for GENCODE and UCSC GTF should be sufficient and is not "hard to understand" or "makes no sense".