bogemad / snpiphy

An automated snp phylogeny pipeline
GNU General Public License v3.0
2 stars 2 forks source link

conda install and parallel mapping #4

Closed iwangtoknow closed 6 years ago

iwangtoknow commented 6 years ago

Hi Daniel, I install snpiphy first time today through conda, after installed the process stopped due to lack sra-tools, and it's easy fix through conda install -y sra-tools. I thought you could add sra-tools to your bioconda build, thanks for your work.

And when I running the program I found in the first calling core genome step, it process isolates genomes one by one, I have 500+ genome assemblies, if the program can process isolates in parallel(different core), snippy's performance will be much better than now. I checked that my cpu in most time more than 70% in free.


I can't find the nucleic acid substitution model option in snpiphy, I need GTR gamma model to compute phylogeny.

Thanks WANG

bogemad commented 6 years ago

Hi Wang,

I've just updated snpiphy to version 0.2 which fixed some bugs and includes a parallel option (-j) and model selection option (-m) as you requested. I've removed the dependency check for sra-tools from the code which wasn't used.

Hopefully, this works for you. Let me know if it doesn't.

Cheers,

Daniel

bogemad commented 6 years ago

Also added version 0.2 to bioconda

iwangtoknow commented 6 years ago

Thanks, Daniel, I opened the issues page of snpiphy, and my issue disappeared, I was astonished. You could leave the issue open and I'll check this, thanks anyway. There is another issue I met today. Wait a minute let me check whether it still there in v0.2.

WANG

iwangtoknow commented 6 years ago

Hi Daniel, Ok, I checked and the code was annotated. What happened to the function find_source_file in utils.py? I have read snpiphy.py and I'm not sure what do you want about the too low coverage sequence, in snpiphy.py line 126, 127

# reads_file = snpiphy.find_source_file(line_data[0], self.reads_dir)
# moved_reads_file = os.path.join(self.excluded_seqs, os.path.basename(reads_file))

Actually in snpiphy.py v0.1 installed from bioconda, there is a bug.


Another question, I'm dealing with microbial genome sequences, mainly in Streptococcus spp. and if I want to calculate a phylogenetic tree for a species or a group of isolates. I need the best-fit model for nucleotide substitution, I can use jModelTest to do this. So is it possible to add jModelTest in your pipeline? That is not a very strong request. link to jModelTest on Github link to jModelTest on evomics

Best regards.

WANG

bogemad commented 6 years ago

Annotated code was a bug left over from the pipeline I used as a base for this one. It was moving the reads file from the user's reads directory to the excluded sequences dir. Not what I wanted it to do.

jModeltest is probably outside the scope of this pipeline. I really just want it to be a fast and automated method of tree building from a bunch of genome reads or assemblies.

iwangtoknow commented 6 years ago

Thanks Daniel.

WANG