The error is the following:
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
Abortado (`core' generado)
Reading other issues (#84 and #548) I realised that the ninth column of the GFF does not contain "transcript_id" or "gene_id" and that instead of "exon" in the third column the regions of interest to map are "CDS". I specified this, following the instructions in the manual (4-May-2021). I am not sure if when dealing with gff3 --sjdbGTFtagExonParentTranscript should be set to "Parent" or to "ID"; I tried both with no changes. Also, I checked that the chromosome names were the same in the sequence and annotation files; which was the case as I merged and converted upstream antiSMASH output (GBK; n=1911) into GFF3 and FASTA formats in the same operation. Any ideas of what might be terminating the process?
Hi Alex,
I'm using STAR for mapping bacterial genomes.
I'm retrieving an error when generating the index with the following code:
STAR --runThreadN 30 --runMode genomeGenerate --genomeDir path_to_genome_dir --genomeFastaFiles path_to_fasta.fna --sjdbGTFfile path_to_GFF.gff --sjdbOverhang 149 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS --alignIntronMax 1 --genomeSAindexNbases 12
Genome length: 51.097.461 bp Reads length: 150 bp
The error is the following: terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check Abortado (`core' generado)
Reading other issues (#84 and #548) I realised that the ninth column of the GFF does not contain "transcript_id" or "gene_id" and that instead of "exon" in the third column the regions of interest to map are "CDS". I specified this, following the instructions in the manual (4-May-2021). I am not sure if when dealing with gff3 --sjdbGTFtagExonParentTranscript should be set to "Parent" or to "ID"; I tried both with no changes. Also, I checked that the chromosome names were the same in the sequence and annotation files; which was the case as I merged and converted upstream antiSMASH output (GBK; n=1911) into GFF3 and FASTA formats in the same operation. Any ideas of what might be terminating the process?
Here a line of how the GFF looks like:
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
ISL001_ctg1 | GenBank | cand_cluster | 1 | 21032 | . | + | 1 | ID=furan.cand_cluster;Alias=furan;Name=furan;candidate_cluster_number=1;contig_edge=False;detection_rules=(mmyO or AvrD);kind=single;product=furan;protoclusters=1;tool=antismash -- | -- | -- | -- | -- | -- | -- | -- | -- ISL001_ctg1 | GenBank | DNA | 1 | 21032 | . | . | 1 | ID=ISL001_ctg1;Alias=ISL001_ctg1;Name=ISL001_ctg1;Note=contig1.,\n##antiSMASH-Data-START##\nVersion :: 5.1.2\nRun date :: 2020-08-04 00:32:29\nNOTE: This is a single cluster extracted from a larger record!\nOrig. start :: 102528\nOrig. end :: 123560\n##antiSMASH-Data-END##;comment1=\n##antiSMASH-Data-START##\nVersion :: 5.1.2\nRun date :: 2020-08-04 00:32:29\nNOTE: This is a single cluster extracted from a larger record!\nOrig. start :: 102528\nOrig. end :: 123560\n##antiSMASH-Data-END##;date=01-JAN-1980 ISL001_ctg1 | GenBank | protocluster | 1 | 21032 | . | + | 1 | ID=furan;Name=furan;aStool=rule-based-clusters;contig_edge=False;core_location=[112528:113560](-);cutoff=20000;detection_rule=(mmyO or AvrD);neighbourhood=10000;product=furan;protocluster_number=1;tool=antismash ISL001_ctg1 | GenBank | region | 1 | 21032 | . | + | 1 | ID=furan.region;Alias=furan;Name=furan;candidate_cluster_numbers=1;contig_edge=False;product=furan;region_number=1;rules=(mmyO or AvrD);tool=antismash ISL001_ctg1 | GenBank | CDS | 886 | 1209 | . | - | 1 | ID=ISL001_ctg1_116;Name=ISL001_ctg1_116;transl_table=11;translation=length.107 ISL001_ctg1 | GenBank | CDS | 1641 | 2252 | . | + | 1 | ID=ISL001_ctg1_117;Name=ISL001_ctg1_117;gene_functions=regulatory (smcogs) SMCOG1016:LuxR family DNA-binding response regulator (Score: 121.3%3B E-value: 5.4e-37);gene_kind=regulatory;transl_table=11;translation=length.203 ISL001_ctg1 | GenBank | CDS | 2506 | 3093 | . | + | 1 | ID=ISL001_ctg1_118;Name=ISL001_ctg1_118;gene_functions=regulatory (smcogs) SMCOG1032:RNA polymerase%2C sigma-24 subunit%2C ECF subfamily (Score: 99.6%3B E-value: 2.7e-30);gene_kind=regulatory;transl_table=11;translation=length.195 ISL001_ctg1 | GenBank | CDS | 3224 | 4726 | . | + | 1 | ID=ISL001_ctg1_119;Name=ISL001_ctg1_119;transl_table=11;translation=length.500 ISL001_ctg1 | GenBank | CDS | 4840 | 5964 | . | + | 1 | ID=ISL001_ctg1_120;Name=ISL001_ctg1_120;transl_table=11;translation=length.374 ISL001_ctg1 | GenBank | CDS | 6324 | 7301 | . | - | 1 | ID=ISL001_ctg1_121;nRPS_PKS=Domain: PKS_ER (23-318). E-value: 4.5e-51. Score: 165.5. Matches aSDomain: nrpspksdomains_ctg1_121_PKS_ER.1,type: other;Name=ISL001_ctg1_121;gene_functions=biosynthetic-additional (smcogs) SMCOG1028:crotonyl-CoA reductase / alcohol dehydrogenase (Score: 286.5%3B E-value: 4.4e-87);gene_kind=biosynthetic-additional;transl_table=11;translation=length.325 ISL001_ctg1 | GenBank | aSDomain | 6348 | 7232 | 165.5 | - | 1 | ID=aSDomain:PKS_ER;aSF=ER configuration inconclusive;Name=ctg1_121;aSDomain=PKS_ER;aSTool=nrps_pks_domains;database=nrpspksdomains.hmm;detection=hmmscan;domain_id=nrpspksdomains_ctg1_121_PKS_ER.1;evalue=4.50E-51;label=ctg1_121_PKS_ER.1;protein_end=318;protein_start=23;tool=antismash;translation=LKLIETDRPVPGPTEILVRVHAAGVNPTDWKTRARGVYVNGVRPPFRLGFDVSGVVEAVGAGVTVFAPGDEVFGMPRFPHPAGAYAEYVTGPARHFTLRPAGQDHIHTAALPLAALTAWQALVDTADIRPGQRVLVHAAAGGVGHLAVQIAKARGAYVIGTARTAKHDFLRGLGADELVDYTQQEFAEVIRDVDVVLDPVGGDCSIRSLRTLRPGGVLISLIPPDETFPAEQARAAGVRAVFMLVEPDQAGLREIAALVDSGQLRAEIAAAVPLEEAAKAHELGETGRTAGKIVLS ISL001_ctg1 | GenBank | aSModule | 6348 | 7232 | . | + | 1 | ID=GenBank:aSModule:ISL001_ctg1:6348:7232;domains=nrpspksdomains_ctg1_121_PKS_ER.1;incomplete=_no_value;locus_tags=ctg1_121;tool=antismash;type=unknown ISL001_ctg1 | GenBank | CDS_motif | 6654 | 6683 | -2.0 | - | 1 | ID=ctg1_121.CDS_motif;Alias=ctg1_121;Name=ctg1_121;aSTool=nrps_pks_domains;database=abmotifs;detection=hmmscan;domain_id=nrpspksmotif_ctg1_121_0003;evalue=4.80E+01;label=PKSI-ER_m2;protein_end=216;protein_start=206;tool=antismash;translation=length.11