dcouvin / CRISPRCasFinder

A Perl script allowing to identify CRISPR arrays and associated Cas proteins from DNA sequences
https://crisprcas.i2bc.paris-saclay.fr
GNU General Public License v3.0
80 stars 28 forks source link

Errors when using --faa and --gff options #24

Open jvera888 opened 3 years ago

jvera888 commented 3 years ago

Hi, I'm trying to run the Singularity version of CRISPRCasFinder (version 4.2.20) with the -faa and -gff options (I generated my GFF file using the Dfast annotator, see below), but I'm getting the following errors:

COMMAND USED: sudo singularity exec -B $PWD CrisprCasFinder.simg perl /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl -so /usr/local/CRISPRCasFinder/sel392v2.so -cf /usr/local/CRISPRCasFinder/CasFinder-2.0.3 -drpt /usr/local/CRISPRCasFinder/supplementary_files/repeatDirection.tsv -rpts /usr/local/CRISPRCasFinder/supplementary_files/Repeat_List.csv -cas -def G -out CrisprCasFinder2 -in MB0146_2.fasta -faa MB0146_protein_2.fasta -gff MB0146_short_2.gff --keep

STDOUT: ################################################################

--> Welcome to /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl (version 4.2.20)

################################################################

vmatch2 is...............OK mkvtree2 is...............OK vsubseqselect2 is...............OK fuzznuc (from emboss) is...............OK needle (from emboss) is...............OK

[23:39:36] ---> Results will be stored in CrisprCasFinderOut

Sequence number 1.. ( Input file: ppMB0146_1.fna, Sequence ID: ppMB0146_1, Sequence name = Unknown ) Nb of CRISPRs in this sequence = 0

prodigal installation is.............OK macsyfinder installation is...........OK MacSyFinder's results will be stored in MB0146_2_22_10_2020_23_39_36/casfinder_ppMB0146_1/ Analysis launched on /home/cris/installs/crisprcasfinder/MB0146_protein_2.fasta for system(s):


Building reports of detected systems


System: General-Class1 (General-Class1_putative)

SequenceID Cas-type/subtype Gene status System Type Begin End Strand Other_information

Use of uninitialized value within %hashGeneType in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1778, line 92. Use of uninitialized value within %hashGeneBegin in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1779, line 92. Use of uninitialized value within %hashGeneEnd in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1780, line 92. Use of uninitialized value within %hashGeneStrand in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1781, line 92. Use of uninitialized value within %hashGeneOther in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1782, line 92. MB0146_17175 Cas2_0I-II-III-V accessory General-Class1 Use of uninitialized value within %hashGeneType in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1790, line 92. Use of uninitialized value within %hashGeneBegin in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1791, line 92. Use of uninitialized value within %hashGeneEnd in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1792, line 92. Use of uninitialized value within %hashGeneStrand in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1793, line 92. Use of uninitialized value within %hashGeneOther in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1794, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1797, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1797, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1797, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1800, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1800, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1800, line 92. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1846. Use of uninitialized value in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1846. Use of uninitialized value $[1] in numeric gt (>) at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 3521. Use of uninitialized value $[0] in numeric gt (>) at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 3521. Use of uninitialized value $[1] in numeric gt (>) at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 3516. Use of uninitialized value $_[0] in numeric gt (>) at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 3516. Use of uninitialized value $beginCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1867. Use of uninitialized value $endCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1867.

Summary system General-Class1:begin=;end=;sequenceID=ppMB0146_1

Use of uninitialized value $beginCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1868. Use of uninitialized value $endCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1868. Use of uninitialized value $beginCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1871. Use of uninitialized value $endCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1871. Use of uninitialized value $beginCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1896. Use of uninitialized value $endCasCluster in concatenation (.) or string at /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl line 1896.

Nb of Cas in this sequence = 1

Statistics on CRISPRs orientation by CRISPRCasFinder vs. CRISPRDirection

Total number of CRISPRs arrays found = 0 Number of perfect macthes between CRISPRCasFinder and CRISPRDirection = 0 Number of Forward by CRISPRCasFinder = 0 Number of Forward by CRISPRDirection = 0 Number of Reverse by CRISPRCasFinder = 0 Number of Reverse by CRISPRDirection = 0

Number of unoriented by CRISPRCasFinder = 0 Number of unoriented by CRISPRDirection = 0 Orientations count file created: MB0146_2_22_10_2020_23_39_36/crisprs_orientations_count.tsv

Secondary folders/files (Prodigal, CasFinder, rawFASTA, CRISPRFinderProperties) have been created

All CRISPRs = 0 All Cas = 1

[23:39:37] Thank you for using /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl! Thank you for your patience!

[23:39:37] The script lasted: 0 year(s) 0 month(s) 0 day(s) , 0 hour(s) 0 minute(s) 1 second(s)

Here is (part of) the GFF I'm using: GFF:

gff-version 3

ppMB0146_1 GAnn plasmid 1 48031 . . . ID=ppMB0146_1;Name=ppMB0146_1;circular=True; ppMB0146_1 Prodigal:2.6.3 CDS 174 404 . + 0 ID=MB0146_16925;Name=MB0146_16925;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 1015 1524 . + 0 ID=MB0146_16930;Name=MB0146_16930;inference=INSD:KOR93936.1;product=dUTPase; ppMB0146_1 Prodigal:2.6.3 CDS 1521 1832 . + 0 ID=MB0146_16935;Name=MB0146_16935;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 1989 2291 . + 0 ID=MB0146_16940;Name=MB0146_16940;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 2288 2638 . + 0 ID=MB0146_16945;Name=MB0146_16945;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 2643 3035 . + 0 ID=MB0146_16950;Name=MB0146_16950;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 3025 3279 . + 0 ID=MB0146_16955;Name=MB0146_16955;inference=Prodigal ab initio prediction;product=hypothetical protein; ppMB0146_1 Prodigal:2.6.3 CDS 3285 4034 . + 0 ID=MB0146_16960;Name=MB0146_16960;inference=RefSeq:WP_011459086.1;product=thymidylate synthase (FAD);EC_number=2.1.1.148; ppMB0146_1 Prodigal:2.6.3 CDS 4047 4478 . + 0 ID=MB0146_16965;Name=MB0146_16965;inference=INSD:KOR93955.1;product=RinA family phage transcriptional regulator;

Could there be something about the GFF that is causing this error (that appears to be related to parsing the GFF)? Please feel free to let me know if you require more information, and thanks you for your help! Cris

jvera888 commented 3 years ago

Hi again, To save some time I thought I'd mention this run was for a single plasmid, but I get the same type of errors for any/all my sequences. The errors seem to be greatly compounded when I give it a genome with multiple chromosomes/plasmids. Here is some of the resulting output, missing several fields of information as you can see: CRISPR-Cas_SUMMAR.tsv:

Sequence(s) CRISPR array(s) Nb CRISPRs Evidence-levels Cas cluster(s) Nb Cas Cas Types/Subtypes
ppMB0146_1 0 Nb_arrays_evidence-level_1=0,Nb_arrays_evidence-level_2=0,Nb_arrays_evidence-level_3=0,Nb_arrays_evidence-level_4=0 General-Class1[;], 1 General-Class1 (n=1),

Cas_REPORT.tsv:

############################################

ppMB0146_1 ( Unknown )

System: General-Class1 (General-Class1_putative)

SequenceID

MB0146_17175

Summary system General-Class1:begin=;end=:{sequenceID=ppMB0146_1} : [Cas2_0_I-II-III-V (,,)]

LOlijslager commented 3 years ago

Hello,

To provide some extra information, I'd just like to mention I'm having the same error, but with the standalone version. My genome consists of many contigs and the error only seems to occur when I give it the whole genome, but not when I only use the contig of interest. I generate my gff files with prodigal.

dcouvin commented 3 years ago

Hi, Thank you for your messages. I will try to fix this issue in the next release. Do you have an error when using the basic commands (without options -faa and -gff)? Thank you in advance. Best, david

LOlijslager commented 3 years ago

Hello, I don't get any errors without -faa and -gff. I also discovered that the error disappears if I make my gff file and faa file only describe one contig and modify the following (see the ID) in the gff file:

contig1_1   Prodigal_v2.6.3 CDS 842 2074    29.4    +   0   ID=4711_1;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.315;conf=99.88;score=29.38;cscore=16.17;sscore=13.21;rscore=7.68;uscore=1.95;tscore=3.59;
contig1_1   Prodigal_v2.6.3 CDS 842 2074    29.4    +   0   ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.315;conf=99.88;score=29.38;cscore=16.17;sscore=13.21;rscore=7.68;uscore=1.95;tscore=3.59;

Though it does give this error (though the output still looks okay). I'm not sure if it's related or not, but thought I'd mention it anyway:

 File "~/CRISPRCasFinder-release-4.2.20/macsyfinder-1.0.5/macsypy/report.py", line 258, in extract
    seq_lg, position_hit = my_db[hit_id]
TypeError: 'NoneType' object is not iterable