AGIScuipeng / PMPrimer

Automatically design multiplex PCR primer pairs for diverse templates
MIT License
20 stars 7 forks source link

MUSCLE 多序列比对异常 #8

Open MAXINELSX opened 9 months ago

MAXINELSX commented 9 months ago

(args_oap) lishuxian@tong1:/mnt/md0/LSX$ ~/primerdesign/PMPrimer/pmprimer.py -f modified.fasta -p notsameseq -a muscle threshold:0.50 merge primer2 --debuglevel 1 序列集共有序列 144 条 / Number Of Sequece Data is 144 序列集平均长度 1236 位 / Everage Length Of Sequece Data is 1236 bp 去重后序列集共 144 条 / Number After Duplicate Remove is 144 长度清洗后共 139 条 / Number After Keep Majority Length is 139 同序列保留亚种后共 139 条 / Number After Keep Different Subspecies When Same Sequece is 139 未分类序列清洗后共 139 条 / Number After Remove Unclassfied is 139 清洗后亚种共 1 个 清洗后物种共 1 个 清洗后属共 1 个 {'modified.fasta'} 清洗完成后序列共 139 条 / Number After Progress is 139 MUSCLE 多序列对比中... MUSCLE 多序列比对异常

N3R1UM commented 9 months ago

MUSCLE多序列比对模块是直接调用的MUSCLEv5,所以你可以直接从https://drive5.com/muscle5/下载MUSCLE,使用MUSCLE的-super5参数对文件进行多序列比对,查看是否是fasta文件的问题,如果直接使用MUSCLE对齐没有问题的话,我再看一下PMPrimer的问题所在。

MAXINELSX commented 9 months ago

是的,我昨晚已經跑完 muscle,沒有問題,但是仍存在無法生成的问题 lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/$ ~/primerdesign/PMPrimer/pmprimer.py -f modified.fasta -p notsameseq -a threshold:0.50 merge primer2 --debuglevel 1 序列集共有序列 289 条 / Number Of Sequece Data is 289 序列集平均长度 1179 位 / Everage Length Of Sequece Data is 1179 bp 去重后序列集共 289 条 / Number After Duplicate Remove is 289 长度清洗后共 289 条 / Number After Keep Majority Length is 289 同序列保留亚种后共 289 条 / Number After Keep Different Subspecies When Same Sequece is 289 未分类序列清洗后共 289 条 / Number After Remove Unclassfied is 289 清洗后亚种共 1 个 清洗后物种共 1 个 清洗后属共 1 个 {'modified.fasta'} 清洗完成后序列共 289 条 / Number After Progress is 289 探测比对后序列的所有保守区域... [22, 38] | 1.3917551993042503 | [[1, 19]] | 19 | [121, 147] | 22.72099462001609 | [169, 188] | 5.077382054953968 | [211, 233] | 5.372048576813916 | [349, 371] | 26.689208484081608 | [475, 491] | 28.29181221155937 | 比对后序列的所有保守区域探测完毕 List Of Conservative Regions is : [[1, 19], [22, 38], [121, 147], [169, 188], [211, 233], [349, 371], [475, 491]] List Of Non Conservative Regions is : [[20, 21], [39, 120], [148, 168], [189, 210], [234, 348], [372, 474]]

正在根据保守区间进行引物设计... 7/7 已经根据保守区间完成引物设计

正在根据保守区间进行二次引物提取... 1/7([2, 19], [2, 19]) 正在根据保守区间进行二次引物提取... 2/7([22, 38], [22, 38]) 正在根据保守区间进行二次引物提取... 3/7([127, 146], [125, 144]) 正在根据保守区间进行二次引物提取... 4/7([171, 188], [171, 188]) 正在根据保守区间进行二次引物提取... 5/7([211, 226], [214, 232]) 正在根据保守区间进行二次引物提取... 6/7([350, 369], [350, 369]) 正在根据保守区间进行二次引物提取... 7/7([475, 491], [475, 491])

已经根据保守区间完成二次引物提取

正在生成扩增子候选... [169, 188] 或 [1, 19] 多样性9、 20超过阈值/ [169, 188] Or [1, 19] Haplotype Is 9, 20 Overtake Threshold [211, 233] 或 [1, 19] 多样性9、 10超过阈值/ [211, 233] Or [1, 19] Haplotype Is 9, 10 Overtake Threshold [349, 371] 或 [1, 19] 多样性9、 28超过阈值/ [349, 371] Or [1, 19] Haplotype Is 9, 28 Overtake Threshold [475, 491] 或 [1, 19] 多样性9、 15超过阈值/ [475, 491] Or [1, 19] Haplotype Is 9, 15 Overtake Threshold [169, 188] 或 [22, 38] 多样性13、 20超过阈值/ [169, 188] Or [22, 38] Haplotype Is 13, 20 Overtake Threshold [211, 233] 或 [22, 38] 多样性13、 10超过阈值/ [211, 233] Or [22, 38] Haplotype Is 13, 10 Overtake Threshold [349, 371] 或 [22, 38] 多样性13、 28超过阈值/ [349, 371] Or [22, 38] Haplotype Is 13, 28 Overtake Threshold [475, 491] 或 [22, 38] 多样性13、 15超过阈值/ [475, 491] Or [22, 38] Haplotype Is 13, 15 Overtake Threshold [349, 371] 或 [121, 147] 多样性16、 28超过阈值/ [349, 371] Or [121, 147] Haplotype Is 16, 28 Overtake Threshold [475, 491] 或 [121, 147] 多样性16、 15超过阈值/ [475, 491] Or [121, 147] Haplotype Is 16, 15 Overtake Threshold [349, 371] 或 [169, 188] 多样性20、 28超过阈值/ [349, 371] Or [169, 188] Haplotype Is 20, 28 Overtake Threshold [475, 491] 或 [169, 188] 多样性20、 15超过阈值/ [475, 491] Or [169, 188] Haplotype Is 20, 15 Overtake Threshold [349, 371] 或 [211, 233] 多样性11、 28超过阈值/ [349, 371] Or [211, 233] Haplotype Is 11, 28 Overtake Threshold [475, 491] 或 [211, 233] 多样性11、 15超过阈值/ [475, 491] Or [211, 233] Haplotype Is 11, 15 Overtake Threshold 扩增子候选生成完毕 共0个 [] 没有合适的扩增子区间/ No Right Amplicon

MAXINELSX commented 9 months ago

即使是单条序列,也是不可以的

(base) lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/primerdesign/Long_subdatabase/dnaid$ conda activate args_oap (args_oap) lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/primerdesign/Long_subdatabase/dnaid$ muscle -align modified.fasta -output modified.afa

muscle 5.1.linux64 [] 791Gb RAM, 48 cores Built May 16 2023 07:53:40 (C) Copyright 2004-2021 Robert C. Edgar. https://drive5.com

Input: 1 seqs, avg length 861, max 861

double free or corruption (out) Aborted (core dumped) (args_oap) lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/primerdesign/Long_subdatabase/dnaid$ muscle -version muscle 5.1.linux64 [] Built May 16 2023 07:53:40

调整格式后生成modified.fasta (args_oap) lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/primerdesign/Long_subdatabase/dnaid/align$ ~/primerdesign/PMPrimer/pmprimer.py -f modified.fasta -p notsameseq -a threshold:0.50 merge primer2 --debuglevel 1 序列集共有序列 1 条 / Number Of Sequece Data is 1 序列集平均长度 861 位 / Everage Length Of Sequece Data is 861 bp 去重后序列集共 1 条 / Number After Duplicate Remove is 1 长度清洗后共 1 条 / Number After Keep Majority Length is 1 同序列保留亚种后共 1 条 / Number After Keep Different Subspecies When Same Sequece is 1 未分类序列清洗后共 1 条 / Number After Remove Unclassfied is 1 清洗后亚种共 1 个 清洗后物种共 1 个 清洗后属共 1 个 {'modified.fasta'} 清洗完成后序列共 1 条 / Number After Progress is 1 探测比对后序列的所有保守区域...

香农熵中断和延续法无法探测到足够的保守区域/ Shannon Terminate Or Continue Cannot Detect Enough Conserved Region

MAXINELSX commented 9 months ago

047246.1 type subtypes 1 ATGCATACGCGGAAGGCAATAACGGAGGCGCTTCAAAAACTCGGAGTCCAAACCGGTGACCTCTTGATGGTGCATGCCTC ACTTAAAGCGATTGGTCCGGTCGAAGGAGGAGCGGAGACGGTCGTTGCCGCGTTACGCTCCGCGGTTGGGCCGACTGGCA CTGTGATGGGATACGCGTCGTGGGACCGATCACCCTACGAGGAGACTCTGAATGGCGCTCGGCTGGATGACGAAGCCCGC CGTACCTGGCTGCCGTTCGATCCCGCAACAGCCGGGACTTACCGTGGGTTCGGCCTGCTGAATCAATTTCTGGTTCAAGC CCCCGGCGCGCGGCGCAGCGCGCACCCCGATGCATCGATGGTCGCGGTTGGTCCGCTGGCTGAAACGCTGACGGAGCCTC ACGAACTCGGTCACGCCTTGGGGGAAGGATCGCCCGTCGAGCGGTTCGTTCGCCTTGGCGGGAAGGCCCTGCTGTTGGGT GCGCCGCTAAACTCCGTTACCGCATTGCACTACGCCGAGGCGGTTGCCGATATCCCCAACAAACGGTGGGTGACGTATGA GATGCCGATGCTTGGAAGAGACGGTGAAGTCGCCTGGAAAACGGCATCGGATTACGATTCAAACGGCATTCTCGATTGCT TTGCTATCGAAGGAAAGCCGGATGCGGTTGAAACTATAGCAAATGCTTACGTGAAGCTCGGTCGCCATCGAGAAGGTGTC GTGGGCTTTGCTCAGTGCTACCTGTTCGACGCGCAGGACATCGTGACGTTCGGCGTCACCTATCTTGAGAAGCATTTCGG AACCACTCCGATCGTGCCTCCGCACGAGGCCGTCGAGCCGTCTTGCGAGCCTTCAGGTTAG

N3R1UM commented 9 months ago

单条序列无法设计多重引物。对于扩增子待选区间结果没有合适的扩增子区间的问题,debug信息给出了原因,那就是候选扩增子区间的Haplotype超出了默认阈值,针对这种情况,在ReadMe中有介绍到可以使用-e hpcnt:x参数来调整Haplotype阈值。

MAXINELSX commented 9 months ago

单条序列无法设计多重引物。对于扩增子待选区间结果没有合适的扩增子区间的问题,debug信息给出了原因,那就是候选扩增子区间的Haplotype超出了默认阈值,针对这种情况,在ReadMe中有介绍到可以使用-e hpcnt:x参数来调整Haplotype阈值。

(args_oap) lishuxian@tong1:/mnt/md0/LSX/lishuxian/lishuxian/primerdesign/Long_subdatabase/dnaid/align$ ~/primerdesign/PMPrimer/pmprimer.py -f modified.fasta -p notlen notsameseq matrix -a threshold:0.05 merge primer2 --debuglevel 1 --evaluate hpcnt:1000 minlen:300 save 序列集共有序列 285 条 / Number Of Sequece Data is 285 序列集平均长度 1554 位 / Everage Length Of Sequece Data is 1554 bp 去重后序列集共 285 条 / Number After Duplicate Remove is 285 长度清洗后共 285 条 / Number After Keep Majority Length is 285 同序列保留亚种后共 285 条 / Number After Keep Different Subspecies When Same Sequece is 285 未分类序列清洗后共 285 条 / Number After Remove Unclassfied is 285 清洗后亚种共 1 个 清洗后物种共 1 个 清洗后属共 1 个 {'modified.fasta'} 清洗完成后序列共 285 条 / Number After Progress is 285 探测比对后序列的所有保守区域...

香农熵中断和延续法无法探测到足够的保守区域/ Shannon Terminate Or Continue Cannot Detect Enough Conserved Region

依然存在不可以生成的问题

MAXINELSX commented 9 months ago

我尝试了更多文件 序列集共有序列 13 条 / Number Of Sequece Data is 13 序列集平均长度 816 位 / Everage Length Of Sequece Data is 816 bp 去重后序列集共 13 条 / Number After Duplicate Remove is 13 长度清洗后共 13 条 / Number After Keep Majority Length is 13 同序列保留亚种后共 13 条 / Number After Keep Different Subspecies When Same Sequece is 13 未分类序列清洗后共 13 条 / Number After Remove Unclassfied is 13 清洗后亚种共 1 个 清洗后物种共 1 个 清洗后属共 1 个 {'modif'} 清洗完成后序列共 13 条 / Number After Progress is 13 探测比对后序列的所有保守区域... [60, 75] | 0.16849943135221637 | [[1, 58]] | 2 | [77, 158] | 0.16849943135221637 | [[1, 75]] | 3 | [160, 196] | 0.2667533917361943 | [[1, 158]] | 4 | [198, 263] | 0.2667533917361943 | [[1, 196]] | 4 | [279, 297] | 0.33699886270443274 | [[1, 263]] | 5 | [314, 339] | 0.6739977254088655 | [[1, 297]] | 7 | [341, 429] | 0.16849943135221637 | [[1, 339]] | 8 | [431, 530] | 0.16849943135221637 | [[1, 429]] | 9 | [548, 603] | 0.7722516857928433 | [[1, 530]] | 12 | [605, 642] | 0.16849943135221637 | [[1, 530], [548, 603]] | 2 | [666, 703] | 0.6739977254088655 | [[1, 530], [548, 642]] | 4 | [705, 791] | 0.16849943135221637 | [[1, 530], [548, 703]] | 4 | [793, 816] | 0.2667533917361943 | [[1, 530], [548, 791]] | 5 | 比对后序列的所有保守区域探测完毕 List Of Conservative Regions is : [[1, 530], [548, 816]] List Of Non Conservative Regions is : [[531, 547]]

正在根据保守区间进行引物设计... 2/2 已经根据保守区间完成引物设计

正在根据保守区间进行二次引物提取... 1/2([446, 465], [127, 146]) 正在根据保守区间进行二次引物提取... 2/2([718, 737], [718, 737])

已经根据保守区间完成二次引物提取

正在生成扩增子候选... 扩增子候选生成完毕 共1个 [([1, 530], [548, 816])] Traceback (most recent call last): File "/home/lishuxian/primerdesign/PMPrimer/pmprimer.py", line 13, in entry() File "/home/lishuxian/primerdesign/PMPrimer/piece/piecentry.py", line 96, in entry pc.maintrunk() File "/home/lishuxian/primerdesign/PMPrimer/piece/piecemain.py", line 531, in maintrunk amplicon_info = pcel.recommend_area_primer(dct=pcdp._same_cnt if 'pcdp' in locals().keys() is not None else None) ^^^^^^^^^^^^^^ AttributeError: 'piecedataprogress' object has no attribute '_same_cnt'

N3R1UM commented 9 months ago

很抱歉接下来一段时间我会很忙,请你先参考Readme和其中的三个命令的示例尝试调整。感谢你的体谅。