B-UMMI / chewBBACA

BSR-Based Allele Calling Algorithm
GNU General Public License v3.0
132 stars 27 forks source link

able to use on assemblies with seq len <100,000 char? #209

Open ojessop opened 1 week ago

ojessop commented 1 week ago

Hi I would love to be able to run this with wgs sequences, and then again separately with amplicon assembles, so that ultimately I can make 2 trees and compare how they cluster depending on input sequence type/information. Chewy works perfectly for wgs sequences, however when I try to run the same thing with amplicon assemblies, I get this error:

"miniconda3/envs/chewbbaca/lib/python3.10/site-packages/CHEWBBACA/utils/gene_prediction.py:75: UserWarning:

sequence should be at least 100000 characters (71324 found)"

etc for all assemblies. Could you please advise if this is a hard rule of >100,000, or if I can override this somehow? I don't mind if the data structure is inferior, I just want to compare.

I ran my own chewBBACA.py PrepExternalSchema -g cgMLST/

Thank you

ramirma commented 1 week ago

Dear @ojessop,

Thank you for your interest in chewBBACA. I am not sure I understand what the input files are. Are these simply the ORFs obtained by specific PCR? There is a flag --cds or --cds-input that allows you to skip the prodigal/pyrodigal step and proceed from there. You can get more details here. Will this solve you problem? If not can you provide us with a more detailed description of the input files and what you are trying to achieve.

Mario