chadlaing / Panseq

Pan-genomic sequence analysis
http://lfz.corefacility.ca/panseq
GNU General Public License v3.0
43 stars 14 forks source link

# Genomes #25

Closed cabeaudoin closed 4 years ago

cabeaudoin commented 5 years ago

Hello,

I hope you all are doing well.

I am trying to run >1,000 genomes using panseq and have adapted the contig headers to the style listed in the README file, but I still seem to be getting >40,000 genomes for the run. I have listed an example of the headers found in one of my fasta files below. Any help would be greatly appreciated. Thanks!

$ grep ">" GCA_003398285.1_ASM339828v1_genomic_clean.fna | head

lcl|GCA_003398285.1_ASM339828v1_genomic|contig1 lcl|GCA_003398285.1_ASM339828v1_genomic|contig2 lcl|GCA_003398285.1_ASM339828v1_genomic|contig3 lcl|GCA_003398285.1_ASM339828v1_genomic|contig4 lcl|GCA_003398285.1_ASM339828v1_genomic|contig5 lcl|GCA_003398285.1_ASM339828v1_genomic|contig6 lcl|GCA_003398285.1_ASM339828v1_genomic|contig7 lcl|GCA_003398285.1_ASM339828v1_genomic|contig8 lcl|GCA_003398285.1_ASM339828v1_genomic|contig9 lcl|GCA_003398285.1_ASM339828v1_genomic|contig10

Best, Chris

chadlaing commented 5 years ago

Hi Chris,

That format does indeed look correct. Is it possible there is one malformed file somewhere?

Thanks, Chad

cabeaudoin commented 5 years ago

Hey Chad,

Thank you so much for your quick response. I realized that the problem was simply having a "." in the filenames (outside of the .fna). I changed those to "_", as suggested on the FAQs of the website, and it seems to be working! Thanks again and sorry for the mistake.

Best, Chris

chadlaing commented 5 years ago

Hi Chris,

I'm glad that it is working for you.

Thanks, Chad

cabeaudoin commented 5 years ago

Hey Chad,

Sorry to be back so soon. I was just wondering if you might know what went wrong during my file execution. In my output directory, I seem to have gotten some ".index" files and some other stuff, but nothing listed from the "output files" section of the README could be found. I tried with just 5 genomes this time. Here is what the output directory looks like:

$ ls -1 944327aa4b46f91b61013c355fc4ee11_9e27ac23e31dbc367f71ac28f143a012_dbtemp.index ab3814808bfe6fdc84cdb16686577d7c_1d982d7ec09527f4f932d28e466b72ac GCA_003546285_1_ASM354628v1_genomic_dbtemp.index GCA_003546425_1_ASM354642v1_genomic_dbtemp.index GCA_003546445_1_ASM354644v1_genomic_dbtemp.index Master.log queryfile_dbtemp queryfile_dbtemp.index singleQueryFile.fasta singleReferenceFile.fasta

and here's what my "settings.txt" file looks, for reference

queryDirectory /home/chris/Documents/genomesqueries referenceDirectory /home/chris/Documents/genomes/reference baseDirectory /home/chris/Documents/genomes/output numberOfCores 5 mummerDirectory /home/chris/software/MUMmer3.23 blastDirectory /home/chris/software/ncbi-blast-2.7.1+/bin minimumNovelRegionSize 500 novelRegionFinderMode unique muscleExecutable /usr/bin/ fragmentationSize 500 percentIdentityCutoff 85 coreGenomeThreshold 5 runMode pan

Any help would be greatly appreciated! Thank you for your time.

Best, Chris

chadlaing commented 5 years ago

Hi Chris,

The program did not run to completion. If you run the tests in t/output.t does everything pass? It could be that one of the external programs isn't recognized.

Thanks, Chad

cabeaudoin commented 5 years ago

Hey Chad,

Thanks very much again for the quick reply. Everything looks good after the t/output.t, and I even tried running just the test genomes using my setup, but I ended getting the same results.

On the command line, I'm executing perl panseq.pl settings.txt

I've attached my Master.log for some hopeful clarification. Any thoughts would be greatly appreciated.

Best, Chris Master.log