dcouvin / CRISPRCasFinder

A Perl script allowing to identify CRISPR arrays and associated Cas proteins from DNA sequences
https://crisprcas.i2bc.paris-saclay.fr
GNU General Public License v3.0
80 stars 28 forks source link

MSG: No file or directory called #44

Open TKsh6 opened 1 year ago

TKsh6 commented 1 year ago

Cause I want to know the crispr/cas in genomes(refseq of bac and arc in NCBI), so I run this command, perl CRISPRCasFinder.pl -in ~/database_db/refseq/archaea_library.fna -so sel392v2.so -cas -keep -out ~/software/crisprcasfinder/test but I got this error,

################################################################
# --> Welcome to CRISPRCasFinder.pl (version 4.3.2)
################################################################

vmatch is...............OK
mkvtree is...............OK
vsubseqselect is...............OK
fuzznuc (from emboss) is...............OK
needle (from emboss) is...............OK

 ---> Results will be stored in /beegfs/home/syl/software/crisprcasfinder/test

  ( Input file: NC_002607.fna, Sequence ID: NC_002607, Sequence name = Halobacterium salinarum NRC-1, complete sequence )
Sequence number 1..
muscle 5.1.linux64 []  132Gb RAM, 72 cores
Built Feb 24 2022 03:16:15
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 5 seqs, avg length 35, max 70

00:00 18Mb   CPU has 72 cores, defaulting to 20 threads

WARNING: Max OMP threads 2

00:00 93Mb    100.0% Calc posteriors
00:00 93Mb    100.0% Consistency (1/2)
00:00 93Mb    100.0% Consistency (2/2)
00:00 93Mb    100.0% UPGMA5
00:00 93Mb    100.0% Refining

muscle 5.1.linux64 []  132Gb RAM, 72 cores
Built Feb 24 2022 03:16:15
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 2 seqs, avg length 31, max 40

00:00 18Mb   CPU has 72 cores, defaulting to 20 threads

WARNING: Max OMP threads 2

00:00 26Mb    100.0% Calc posteriors
00:00 26Mb    100.0% UPGMA5

muscle 5.1.linux64 []  132Gb RAM, 72 cores
Built Feb 24 2022 03:16:15
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 2 seqs, avg length 41, max 41

00:00 18Mb   CPU has 72 cores, defaulting to 20 threads

WARNING: Max OMP threads 2

00:00 26Mb    100.0% Calc posteriors
00:00 26Mb    100.0% UPGMA5

------------- EXCEPTION -------------
MSG: No file or directory called 'NC_002607.fna'
STACK Bio::DB::IndexedBase::new /beegfs/home/syl/anaconda3/envs/crisprcasfinder/lib/perl5/site_perl/Bio/DB/IndexedBase.pm:368
STACK main::reportToGff CRISPRCasFinder.pl:2545
STACK main::makeGff CRISPRCasFinder.pl:2427
STACK toplevel CRISPRCasFinder.pl:653
-------------------------------------

the seq names in archaea_library.fna are like this,

kraken:taxid|64091|NC_002607.1 Halobacterium salinarum NRC-1, complete sequence
kraken:taxid|64091|NC_001869.1 Halobacterium salinarum NRC-1 plasmid pNRC100, complete sequence
kraken:taxid|64091|NC_002608.1 Halobacterium salinarum NRC-1 plasmid pNRC200, complete sequence
kraken:taxid|273057|NC_002754.1 Saccharolobus solfataricus P2, complete sequence
kraken:taxid|192952|NC_003901.1 Methanosarcina mazei Go1, complete sequence
kraken:taxid|190192|NC_003551.1 Methanopyrus kandleri AV19, complete sequence
kraken:taxid|178306|NC_003364.1 Pyrobaculum aerophilum str. IM2, complete sequence
kraken:taxid|186497|NC_003413.1 Pyrococcus furiosus DSM 3638, complete sequence
kraken:taxid|188937|NC_003552.1 Methanosarcina acetivorans C2A, complete sequence
kraken:taxid|263820|NC_005877.1 Picrophilus torridus DSM 9790, complete sequence

I don't know how to solve this, can you help me?

yours tk,

dcouvin commented 1 year ago

Hi @TKsh6 , Thank you for your message. I have never seen this mistake before. Please try to simplify sequence IDs and avoid the dot in these IDs. Hope this will solve the problem. Best, David