chadlaing / Panseq

Pan-genomic sequence analysis
http://lfz.corefacility.ca/panseq
GNU General Public License v3.0
43 stars 14 forks source link

i can run the test but not the experiment with my genomes #26

Closed EmilianoMaresi closed 5 years ago

EmilianoMaresi commented 5 years ago

Hi, i'm trying to run panseq on my genomes but all i get is a warning spamming in stdout:

Overwrite set to true. Deleting directory ./synthetic_output/ 2019/04/08 11:31:10 INFO | NovelIterator.pm:186> We have 4961 genomes this run 2019/04/08 11:31:11 WARN | CombineFilesIntoSingleFile.pm:83> Skipping ./synthetic_output/62624a59a83356893f0365cef7132da6_965b7fb4409c563321d33d48470db364_NR as it has size of 0 2019/04/08 11:31:11 WARN | CombineFilesIntoSingleFile.pm:83> Skipping ./synthetic_output/2f922fadbde7c421a86f1c8b15f57ef7_49afd7839dd1e47eda0d7b9e263f94c6_NR as it has size of 0 ERROR: Could not parse delta file, ./synthetic_output/724df4764c6cb27f57630bb7f4db03ff_c64063829facece23b9feafb1fb74417.delta error no: 400 ERROR: Could not parse delta file, ./synthetic_output/c838799005c75d029fefb614c3c2b511_7e75d0552ec714b52526e79c0bc219ef.delta error no: 400

in my working directory i get a nucmer.error file that says : 20190408|113121| 613| ERROR: The following critical files could not be used 20190408|113121| 613| /home/steve/Scrivania/MUMmer3.23/aux_bin/postnuc 20190408|113121| 613| /home/steve/Scrivania/MUMmer3.23/aux_bin/prenuc 20190408|113121| 613| /home/steve/Scrivania/MUMmer3.23/mgaps 20190408|113121| 613| /home/steve/Scrivania/MUMmer3.23/mummer 20190408|113121| 613| Check your paths and file permissions and try again

my permissions are all enabled on those files drwxrwxrwx 2 steve steve 4096 apr 5 15:14 . drwxrwxrwx 6 steve steve 4096 apr 5 15:15 .. -rwxr-xr-x 1 steve steve 76448 apr 5 15:14 postnuc -rwxr-xr-x 1 steve steve 85152 apr 5 15:14 postpro -rwxr-xr-x 1 steve steve 18320 apr 5 15:14 prenuc -rwxr-xr-x 1 steve steve 26800 apr 5 15:14 prepro

my settings file is as follows: queryDirectory ./synthetic_genomes baseDirectory ./synthetic_output numberOfCores 6 mummerDirectory ./MUMmer3.23/ blastDirectory ./blast2.9/ minimumNovelRegionSize 500 novelRegionFinderMode no_duplicates muscleExecutable ./muscle3.8.31/muscle3.8.31 fragmentationSize 500 percentIdentityCutoff 85 coreGenomeThreshold 2 runMode pan cdhitDirectory ./cd-hit-v4.8.1 overwrite 1

I'm using 10 genomes (from SE001 to SE010), each genome is a collection of genes in fasta format i.e.

G1_SE001 .... G2_SE001 ....

i use blast 2.9.0+ , MUMmer3.23, muscle3.8.31, cd-hit-v4.8.1. All of them are folders inside the Panseq-master folder ( i simply dragged them in ). The reason i did this was to allow myself to run everything inside the panseq-master folder.

I used the online version of panseq ( https://lfz.corefacility.ca/panseq/page/pan.html ) and i get a good result so i guess i my input files are accepted.

thank you

chadlaing commented 5 years ago

Hi @Fogatogithub,

Do you really have 4961 genomes that you are analyzing? If there are multiple genomes in the same file, each contig needs to be uniquely identified, but also labelled as part of a genome. The README has an example of how to format the headers for this.

Let me know if this helps, Chad