biologger / speciesprimer

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems.
GNU General Public License v3.0
39 stars 19 forks source link

primer pairs left after secondary amplicon QC #14

Open Longhx1112 opened 2 years ago

Longhx1112 commented 2 years ago

core genes: 2017 single copy core genes: 1836 Number of conserved sequences: 1812 species specific conserved sequences: 536 potential primer pair(s): 4578 primer pairs with good target binding: 4260 primer pairs left after non-target QC: 615 primer pairs left after secondary amplicon QC: 0 primer pairs left after mfold: 0 primer pairs left after primer QC: 0

Hi, I have a question when using speciesprimer. "primer pairs left after secondary amplicon QC" is zero, which parameters can be modified? I have already tried "ignore_qc" and“skip_tree”, but it didn't work. Looking forward to your reply, thank you very much!

biologger commented 2 years ago

Hi, The ignore_qc and skip_tree options only affect the input quality control not the primer quality control. The secondary amplicon check takes 10 input assemblies and uses MFEprimer to check if only one amplicon is created for each assembly with the primer pairs.
You can check the results in the /primerdesign/your_target/Pangenome/results/primer/primerQC/MFEprimer_assembly.csv file. A less stringent selection can be achieved using a lower mfethreshold for the --mfethreshold option. Default is 90, you could try also 85 or 80, I would not recommend to go below 70. If the MFEprimer_assembly.csv is empty there is probably a problem with the database.

lanying commented 2 years ago

I want to know why just takes 10 input assemblies to check secondary amplicon check?

biologger commented 2 years ago

It is a matter of speed and computing power. In cases where we have for example 500 input assemblies the MFEprimer database would get too large and the QC would take forever. The pipeline selects the 10 assemblies according to the completeness: Complete Genomes > Chromosome > Scaffolds > Contigs

The number of assemblies could be changed by changing the speciesprimer.py script in the PrimerQualityControl class:

class PrimerQualityControl:
    def __init__(self, configuration):
         self.referencegenomes = 10 <-- change this number

To check more than 20 input assemblies I would recommend to split the assemblies in several DBs, to speed up the database indexing, however it will still take a lot of time.

Maybe an additional option to define the number of assemblies can be implemented in a future version.