gengit / FRAMA

FRAMA: From RNA-seq data to annotated mRNA assemblies
Other
12 stars 6 forks source link

annotation.pl error #3

Closed francicco closed 8 years ago

francicco commented 8 years ago

Hi,

I just found another error

perl annotation.pl \
        -new  \
        -taxon-assemble 323550 \
        -taxon-ortholog 7227  \
        -output-dir  transcripts \
        -output-file assembly/annotated.gbk \
        -trinity assembly/Trinity_preprocessed.fa \
        -reference GCF_000001215.4_Release_6_plus_ISO1_MT_rna.gbff \
        -assignment tables/annotation.csv \
        -read-index trinity/dir2comp.index \
        -scaffolding blast/scaffolding_candidates_trin2ref.csv \
        -blast blast/avg_CDS_ref2trin_sym.csv \
        -predictions -genscan-matrix genscanlinux/HumanIso.smat -cpus 15 -ortholog-table  -ortholog-cds  \
            > logs/annotation.log \
            2> logs/annotation.err
make: *** [assembly/annotated.gbk] Error 2

Use of uninitialized value in -e at FRAMA-master/src/annotation.pl line 245.
ERROR: Missing files:
    Ortholog table not found.
    Ortholog CDS sequences not found.
Died at FRAMA-master/src/annotation.pl line 254.

The only things that seems to be wrong is the lack of blast/scaffolding_candidates_trin2ref.csv

francicco commented 8 years ago

Ok, I think I get it which ortholog table is referring to. But I'm not gonna use it now, how to tell FRAMA not to use it? Thanks a sorry for bothering you so much.

francicco commented 8 years ago

I execute annotation.pl manually but I got this:

Reading reference genbank
Reading trinity contigs
Reading blast result for CDS annotation
Reading blast result for scaffolding fragments
Retrieving hash: component => readfile
Starting...
ERROR: Readfile not found for component: TRINITY
ERROR: No read file found: TRINITY_DN1002_c0_g1_i1
ERROR: Readfile not found for component: TRINITY
ERROR: No read file found: TRINITY_DN1022_c0_g1_i1
TRINITY_DN1002_c0_g1_i1 NM_057428   RpL40    1 wallclock secs ( 0.04 usr  0.01 sys +  0.36 cusr  0.64 csys =  1.05 CPU)
TRINITY_DN1022_c0_g1_i1 NM_140813   RpL26    1 wallclock secs ( 0.05 usr  0.01 sys +  0.39 cusr  0.65 csys =  1.10 CPU)
ERROR: Readfile not found for component: TRINITY
ERROR: No read file found: TRINITY_DN1100_c0_g1_i1

What is wrong now...

mrtnbns commented 8 years ago

Trinity has changed the the naming pattern of sequences and its directory structure. Last version that works for sure is r20140717. Will be fixed.

Regarding your first issue: Discard ortholog-table and ortholog-cds parameter.

francicco commented 8 years ago

Thanks,

do you know when the update could be done?

F

mrtnbns commented 8 years ago

Should be ready by the end of next week.

francicco commented 8 years ago

Thank you!

francicco commented 8 years ago

Hi everyone,

Now that I have the ortholog table I'm running annotation.pl to check if it worksm but apparently I encountered another error.

make: *\ [.../FRAMA_out/assembly/annotated.gbk] Error 25

the log file is like this:

Reading reference genbank Reading trinity contigs Reading blast result for CDS annotation Reading blast result for scaffolding fragments Reading ortholog table Indexing ortholog CDS fasta What exactly is the problem?

thank you F

mrtnbns commented 8 years ago

Hey,

I guess it is still the same problem, as I haven't fixed it yet.

Can you provide your annotation.pl call and the content of logs/annotation.err?

francicco commented 8 years ago

I guess is my problem now... I think...

WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 323550
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 7227
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 307658
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 609295
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 7460
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 67767
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 7425
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 610380
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 411798
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 443821
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 64793
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 104421
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 13686
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 144034
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 7227
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 12957
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 83485
WARNING: Failed to connect to taxonomy database. No internet access?
WARNING: Could not retrieven taxon information for ID: 103372

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the qual file must be less than 65,536 characters. Line 76129 is 70264 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::DB::IndexedBase::_check_linelength /usr/local/share/perl5/Bio/DB/IndexedBase.pm:738
STACK: Bio::DB::Fasta::_calculate_offsets /usr/local/share/perl5/Bio/DB/Fasta.pm:175
STACK: Bio::DB::IndexedBase::_index_files /usr/local/share/perl5/Bio/DB/IndexedBase.pm:642
STACK: Bio::DB::IndexedBase::index_file /usr/local/share/perl5/Bio/DB/IndexedBase.pm:484
STACK: Bio::DB::IndexedBase::new /usr/local/share/perl5/Bio/DB/IndexedBase.pm:364
STACK: annotation.pl:307
francicco commented 8 years ago

Apparently the first problem "No internet access" is given by the cluster when the job is submitted. If I run it from the front-end the connection is available and only the second error pup up

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the qual file must be less than 65,536 characters. Line 76129 is 70264 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::DB::IndexedBase::_check_linelength /usr/local/share/perl5/Bio/DB/IndexedBase.pm:738
STACK: Bio::DB::Fasta::_calculate_offsets /usr/local/share/perl5/Bio/DB/Fasta.pm:175
STACK: Bio::DB::IndexedBase::_index_files /usr/local/share/perl5/Bio/DB/IndexedBase.pm:642
STACK: Bio::DB::IndexedBase::index_file /usr/local/share/perl5/Bio/DB/IndexedBase.pm:484
STACK: Bio::DB::IndexedBase::new /usr/local/share/perl5/Bio/DB/IndexedBase.pm:364
STACK: annotation.pl:307

What is the file for which annotation.pl is complaining? Maybe is not formatted properly.

Thanks F

mrtnbns commented 8 years ago

Hey,

I added the support for genome-guided mode and more recent trinity versions in the 'dev' branch.

git clone https://github.com/gengit/FRAMA.git
git checkout dev

Even if you did not use FRAMA to run trinity, you need to add the "--genome_guided_bam" parameter to OPT_TRINITY to let FRAMA know that it deals with trinity's genome-guided output. For instance

OPT_TRINITY   := --max_memory 10G --seqType fa --genome_guided_max_intron 10000 --genome_guided_bam /my/path/to/alignment.bam

This should work for Trinity versions >2.1.

-- Martin

francicco commented 8 years ago

Thank you Martin,

I'll give it immediately a try!

francicco commented 8 years ago

I get the same error

### ------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the qual file must be less than 65,536 characters. Line 76129 is 70264 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::DB::IndexedBase::_check_linelength /usr/local/share/perl5/Bio/DB/IndexedBase.pm:738
STACK: Bio::DB::Fasta::_calculate_offsets /usr/local/share/perl5/Bio/DB/Fasta.pm:175
STACK: Bio::DB::IndexedBase::_index_files /usr/local/share/perl5/Bio/DB/IndexedBase.pm:642
STACK: Bio::DB::IndexedBase::index_file /usr/local/share/perl5/Bio/DB/IndexedBase.pm:484
STACK: Bio::DB::IndexedBase::new /usr/local/share/perl5/Bio/DB/IndexedBase.pm:364
STACK: annotation.pl:302
-----------------------------------------------------------
mrtnbns commented 8 years ago

Thank for trying.

This error comes from BioPerl during indexing of the ortholog file (which should be in FASTA format). Looks like it is complaining about lines that are too long? If you have FASTX-Toolkit installed, try to use the fasta-formatter to wrap lines: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fasta_formatter_usage

BioPerl states:

The Fasta files may contain any combination of nucleotide and protein sequences; during indexing the module guesses the molecular type. Entries may have any line length up to 65,536 characters, and different line lengths are allowed in the same file. However, within a sequence entry, all lines must be the same length except for the last. An error will be thrown if this is not the case.

francicco commented 8 years ago

Ok,

Running annotation.pl without orthologs gives me a lot of

ERROR: mRNA clipping failed

Is that normal?

Now I try wrapping the fasta, thanks F

mrtnbns commented 8 years ago

No, it's not normal.

Could you provide the output of head FRAMA_OUTPUT/trinity/dir2comp.csv and the output of 3UTR.err of an arbitrary transcript in FRAMA_OUTPUT/transcripts/*/3UTR.err.

You are using Trinity Version >2.1 with genome-guided mode, right?

francicco commented 8 years ago

dir2comp.csv:

c61784  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61784.trinity.reads.fa
c61756  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61756.trinity.reads.fa
c61764  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61764.trinity.reads.fa
c61773  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61773.trinity.reads.fa
c61787  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61787.trinity.reads.fa
c61791  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61791.trinity.reads.fa
c61826  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61826.trinity.reads.fa
c61819  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61819.trinity.reads.fa
c61744  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61744.trinity.reads.fa
c61763  /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/trinity/read_partitions/Fb_0/CBin_616/c61763.trinity.reads.fa

3UTR.err

sh: /home/lv70640/c7701100/software/FRAMA/src/polyamisc/polyaspwm_predict.pl: /usr/local/bin/perl: bad interpreter: No such file or directory
sh: /home/lv70640/c7701100/software/FRAMA/src/polyamisc/polyaspwm_score.pl: /usr/local/bin/perl: bad interpreter: No such file or directory

Error: Identification of PAS failed

I'm using Trinity 2.1.1 not the genome-guided mode.

mrtnbns commented 8 years ago

Thanks. Should work now.

francicco commented 8 years ago

Do I have tu re-clone FRAMA?

mrtnbns commented 8 years ago

That should to it:

git pull origin dev
francicco commented 8 years ago

Another thing. Since FRAMA uses gene symbols it gives this error:

sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `genscan /home/lv70640/c7701100/software/genscanlinux/HumanIso.smat /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/transcripts/fs(1)h_NM_206647/contig.fa > /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/transcripts/fs(1)h_NM_206647/CDS_genscan.txt 2> /global/lv70640/c7701100/Projects/Talp_FRAMA_annotation/FRAMA_out/transcripts/fs(1)h_NM_206647/CDS_genscan.txt.err'

This is given by the very annoying parentesis in the gene name. You may want to use a different id.

francicco commented 8 years ago

About the GG trinity, so far it's working, the only think is that I had to rename the assembly from Trinity-GG.fasta in Trinity.fasta. The default name in trinity genome-guided is actually Trinity-GG.fasta.

Thank you anyway! F

gengit commented 8 years ago

Well, the script for linking contig output to component-grouped read lists relies on the file name |Trinity-GG.fasta| . So, a symbolic link to have both file namings would be optimal.

On 06/29/16 11:40, Francesco Cicconardi wrote:

About the GG trinity, so far it's working, the only think is that I had to rename the assembly from |Trinity-GG.fasta| in |Trinity.fasta|. The default name in trinity genome-guided is actually |Trinity-GG.fasta|.

Thank you anyway! F

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gengit/FRAMA/issues/3#issuecomment-229308195, or mute the thread https://github.com/notifications/unsubscribe/AMJDEdenNshGQ5HHwqxMU3e6EKEArmGXks5qQj2XgaJpZM4I1Rbz.

Karol Szafranski Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Beutenbergstrasse 11, D-07745 Jena, Germany voice : ++49-3641-656804 fax : ++49-3641-656255 email : karol.szafranski@leibniz-fli.de web site: http://www.leibniz-fli.de

Wissenschaftlicher Vorstand: Prof. Dr. K. Lenhard Rudolph Administrativer Vorstand: Dr. Daniele Barthel Vorsitzender des Kuratoriums: Burkhard Zinner Vereinsregister Nr.: 230296, Amtsgericht Jena UST-IdNr.: DE 153 925 464 Steuernummer: 162/141/08228

mrtnbns commented 8 years ago

Nice to know that it is working. Thank you Francicco.

Currently, FRAMA moves Trinity-GG.fasta to Trinity.fasta to have a consistent source for subsequent steps. But it is correct that a symbolic link would be better.

I think this issue solved.

francicco commented 8 years ago

Sorry Martin... but

perl store_hash.pl -input FRAMA.GG.out/trinity/dir2comp.csv -c1 1 -c2 2 -noarray -output FRAMA.GG.out/trinity/dir2comp.index
Input file empty
make: *** [FRAMA.GG.out/trinity/dir2comp.index] Error 255

The error persists... F

mrtnbns commented 8 years ago

Thank for patience and persistence.

FRAMA.GG.out/trinity/dir2comp.csv seems to be empty. Can you verify that?

Now you called FRAMA using your own genome-guided assembly and previously renamed/symlinked Trinity-GG.fasta to Trinity.fasta?

What is the output of

find FRAMA.GG.out/trinity/ -name "*Trinity.fasta" | head
francicco commented 8 years ago
FRAMA.GG.out/trinity/Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_1330/0/1_1921.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_693/0/2800_3551.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/29651_30353.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/7682_11129.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/4224_5239.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/31_4062.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/22971_24215.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_512/0/18902_22809.trinity.reads.out.Trinity.fasta
FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/1144801_1145394.trinity.reads.out.Trinity.fasta
mrtnbns commented 8 years ago

That looks fine.

perl FRAMA/src/find_ggreadlist.pl FRAMA.GG.out/trinity/ should give you a TrinityID to read file association table.

Did you set the parameter I mentioned (--genome_guided_bam in OPT_TRINITY)? The actual file you specify does not matter because FRAMA will not call Trinity again.

francicco commented 8 years ago

Yes I did, I specified:

OPT_TRINITY := --max_memory 10G --seqType fa --genome_guided_max_intron 10000 --genome_guided_bam FRAMA.GG.out/trinity/Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam

When I run find_ggreadlist.pl it gives me a list of:

WARNING: unspecific contig header information, c1_g1_i1 len=250 path=[228:0-249] [-1, 228, -2]
  1st FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/1306533_1308021.trinity.reads
  2nd FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/342901_344001.trinity.reads
....
gengit commented 8 years ago

To get closer to the problem

1. the listed warnings have minor relevance. Do you get other warnings further below? Try

grep -v -A2 'unspecific contig header' log_file

2. The warned transcript contigs are probably identical. Just to confirm, look into the content of

FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/1306533_1308021.trinity.reads

FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/342901_344001.trinity.reads

3. Some glance of FRAMA.GG.out/trinity/Trinity.fasta (or Trinity-GG.fasta)

grep '^>' FRAMA.GG.out/trinity/Trinity.fasta | head

On 06/30/16 10:53, Francesco Cicconardi wrote:

Yes I did, I specified:

|OPT_TRINITY := --max_memory 10G --seqType fa --genome_guided_max_intron 10000 --genome_guided_bam FRAMA.GG.out/trinity/Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam|

When I run find_ggreadlist.pl it gives me a list of:

|WARNING: unspecific contig header information, c1_g1_i1 len=250 path=[228:0-249] [-1, 228, -2] 1st FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/1306533_1308021.trinity.reads 2nd FRAMA.GG.out/trinity/Dir_Talp.mRNA.onto.Genome.Aligned.sortedByCoord.out.bam.+.sam.minC1.gff/scaffold_19/0/342901_344001.trinity.reads .... |

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gengit/FRAMA/issues/3#issuecomment-229601217, or mute the thread https://github.com/notifications/unsubscribe/AMJDEegwRYGdgcZzoTptPxA-DFg4_EHHks5qQ4QjgaJpZM4I1Rbz.

Karol Szafranski Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Beutenbergstrasse 11, D-07745 Jena, Germany voice : ++49-3641-656804 fax : ++49-3641-656255 email : karol.szafranski@leibniz-fli.de web site: http://www.leibniz-fli.de

Wissenschaftlicher Vorstand: Prof. Dr. K. Lenhard Rudolph Administrativer Vorstand: Dr. Daniele Barthel Vorsitzender des Kuratoriums: Burkhard Zinner Vereinsregister Nr.: 230296, Amtsgericht Jena UST-IdNr.: DE 153 925 464 Steuernummer: 162/141/08228

francicco commented 8 years ago

1. I think it is the only warning... but it is still running

2. I dont understand what I suppose to see. These are actual reads

3.

>TRINITY_GG_1_c0_g1_i1 len=211 path=[207:0-103 208:104-120 209:121-210] [-1, 207, 208, 209, -2]
>TRINITY_GG_1_c0_g2_i1 len=212 path=[210:0-103 211:104-121 212:122-211] [-1, 210, 211, 212, -2]
>TRINITY_GG_1_c1_g1_i1 len=250 path=[253:0-128 254:129-152 255:153-249] [-1, 253, 254, 255, -2]
>TRINITY_GG_1_c2_g1_i1 len=384 path=[384:0-164 385:165-185 386:186-383] [-1, 384, 385, 386, -2]
>TRINITY_GG_1_c2_g2_i1 len=385 path=[387:0-164 388:165-186 389:187-384] [-1, 387, 388, 389, -2]
>TRINITY_GG_1_c3_g1_i1 len=521 path=[523:0-24 524:25-520] [-1, 523, 524, -2]
>TRINITY_GG_1_c3_g2_i1 len=543 path=[525:0-46 526:47-542] [-1, 525, 526, -2]
>TRINITY_GG_1_c4_g1_i1 len=385 path=[1718:0-24 1719:25-26 1720:27-28 1721:29-30 1722:31-32 1723:33-34 1985:35-36 1956:37-38 1972:39-40 1984:41-42 1955:43-44 1971:45-46 2016:47-48 2018:49-229 2017:230-259 2022:260-300 2025:301-384] [-1, 1718, 1719, 1720, 1721, 1722, 1723, 1985, 1956, 1972, 1984, 1955, 1971, 2016, 2018, 2017, 2022, 2025, -2]
>TRINITY_GG_1_c4_g1_i2 len=1257 path=[1718:0-24 1719:25-26 1720:27-28 1721:29-30 1722:31-32 1723:33-34 1985:35-36 1956:37-38 1972:39-40 1984:41-42 1955:43-44 1971:45-46 2016:47-48 2018:49-229 2017:230-259 2022:260-300 2026:301-357 2028:358-359 2011:360-381 2014:382-383 2005:384-387 2010:388-463 1997:464-485 2004:486-507 1988:508-530 2021:531-539 2024:540-560 1974:561-563 1995:564-584 1986:585-607 1704:608-608 1705:609-624 1958:625-632 1939:633-1208 1829:1209-1209 1830:1210-1210 1831:1211-1211 1832:1212-1212 1833:1213-1213 1834:1214-1214 1835:1215-1215 1836:1216-1216 1837:1217-1217 1838:1218-1256] [-1, 1718, 1719, 1720, 1721, 1722, 1723, 1985, 1956, 1972, 1984, 1955, 1971, 2016, 2018, 2017, 2022, 2026, 2028, 2011, 2014, 2005, 2010, 1997, 2004, 1988, 2021, 2024, 1974, 1995, 1986, 1704, 1705, 1958, 1939, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, -2]
>TRINITY_GG_1_c5_g1_i1 len=4504 path=[6041:0-35 6115:36-263 6106:264-1833 6074:1834-1857 6114:1858-2394 6025:2395-2418 6105:2419-2523 6113:2524-2547 6120:2548-2576 6104:2577-2600 6124:2601-2737 6119:2738-2761 6103:2762-2863 6111:2864-2887 6123:2888-3040 6118:3041-3106 6110:3107-3115 6122:3116-3458 6117:3459-3479 6126:3480-3499 6125:3500-3523 6121:3524-3669 6108:3670-4076 6127:4077-4100 6128:4101-4123 6129:4124-4147 6130:4148-4257 6035:4258-4503] [-1, 6041, 6115, 6106, 6074, 6114, 6025, 6105, 6113, 6120, 6104, 6124, 6119, 6103, 6111, 6123, 6118, 6110, 6122, 6117, 6126, 6125, 6121, 6108, 6127, 6128, 6129, 6130, 6035, -2]
mrtnbns commented 8 years ago

Hey Francicco,

at the moment, we don't have the capacity to look into that problem any time soon.

find_ggreadlist.pl works fine using Trinity's sample data for genome-guided assemblies and FRAMAs example data (reads mapped to the human genome). We thought that reads are partitioned into loci based on the provided alignment file and each loci corresponds to one "batch" of Trinity contigs (so, one read file belongs to TRINITY_GG_1_*, another to TRINITY_GG_2_*, ...). However, considering your warnings that doesn't seem to be the case.