Closed Szymonome closed 8 years ago
Hi Szymon,
I was not able to reproduce your error. Could you kindly send me the 73rd read causing the error from your fasta file together with the command you used to run the pipeline?
Thanks. Chenhao.
Hi Chenhao,
We run INC-Seq on CentOS 6.3, and python 2.7.8 would it make any difference?
Remaining programs have been installed according to your requirements: Biopython 1.65 and BLAST 2.2.28+
Command I used: /home/opt/INC-Seq/inc-seq.py -i pass.fasta -o pass.out
Read 73:
2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d GISNB474_10bacRCAsheared091215_4947_1_ch100_file68_strand pass/GISNB474_10bacRCAsheared091215_4947_1_ch100_file68_strand.fast5
CCGTGGTTATACTTAGCCCGGAAGACAACCTTACCAAATCTTGACATCCTTTGACACTCTAGGATAGAGCCTTCCCCTTCGGGGACAAAGTGACAGGTGTGGCATGGTTGTCAGCTCGTGTCACGCTTTCTAAAGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAAGGTCTTCTGGATCGTAAAACTCTGTTATTAGGGAAGAACATATGTGTAAGTAACTGTGCATCTTGACGGTACCTAAGGCCGAAAGCCACGGCTAACACGTGCCAGCAGCCCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTAAGTCTGATGTGAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAACAGGAAAGTGGAATTCCATGTGTAGCGGTGAAAAATGCGCGGATATGGAGGAACACCAGTGTGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGGGGATCAAACAGGATTAGATACCTTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCTTAGTGCTGCAATGGACCCGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGGCAACCGGACGCGAAGAACCTTACCAAATCTTGACATCCTTTGACAACAACTCTAGAGATAGAGCCTTCCCCTTCGGGGACAAATGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGCTGTCCTCCACGGGAGGCAGCGATCAGGGAATTCCGCGAAACTGGCAAGCTGAGGCGCCAGTAGTATGAAGGTTCGGATCGTAAACTCTGTTATTAGGAAGAACATATGTGATATGTGCACATCTTGACGGTACCTAATGAAGACGCTAACTACGTGCCAGCAGCCCGCGGTAGGTACCACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCCGTGGAGGTCATTGGAATCTGGAAACTTGAGTGGAGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGCAGAGATATGAGGAACACCAGTGGCGAAGGCGATTCTGGTCTGTAACTGACGCTGATGTGCGAAAGTGTGGGGGATCAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGATTCCGCCCCTTAGCTGCTGCGCAGCTAACGCATTTGAGGCCGCTCCGCCTGGGGAGTACGACCGCAGGTTGAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATGCGAAGCAACGGGGTGAAAGAACCTTACCACAAATCTTGACATCCTTTGACAACTCCGAGACCAGCCTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGCTTCTACGGGAGGCAGCAGTAGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAAACTCTGTTATTAGGGGAAGAACATATGTGTAATAACTGTGCACATCTTGACGGTATAAGATTACAGAAAGCCACGGCTAACTCGTGCCAGCACCCCCGGGGCGGTAATACGTAGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAGGGCGCGTAGGCGGTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTAATATTAAGAAGAGGAAAGTGGAAATTCCATGTGTAGCACGTGAAATCCCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTTATGACTGACGCTGATGTGCGAGAAAGCGTGGGGGATCAAACAGGATTAGATACCCCTGGTAGTCCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGTTTTCCGCACGCTGATCAACTGCATAGGCATTCCACACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAAAACCTGCAAAGGAATGACGGGGACCCGCACAAGCATCTTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTATGACCGCCTTTGACACTCTAGAGATAGAGCCTTCCCCTTCGGGGGACAAAGTACCTAGGTTGCATGGTTGTCGTCAGCTCGTGTCGGTCCTCCCGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGGCGAAGCCTGACGGAGCAACGCCGCGTGAGTGATATGAAGGTCTTCGATCGTAAAACTCTGTTATTAGGGAAAGAACATATGTGTAAGTAACTGTGCACATCTTGACGGTACGGATCAGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGCAAGCGTTAGCGGAATTATTGCGTAAGGGCGCGGTAGGGCGGTTTAAGTCTGATGTGAAAGCCCACGGCTCACCGTGTTGGGGGGTCATTTGGAAATGGGAAAACTTGAGTGCAGAAGAAAGTGGAATTCCATGTGTAGCGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCAAAGCGACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGGGGATCAAACACCCAAGTCGATACCCCTGGTAGTCCACCGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCTTAGTGCGCTGCAGCACTGGAAGTTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACCTGCCAAAGGAATTGACTGGATAGGGACAAGCGGTGGAGCATTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTGACATCCTTTGACAACTCTAGAGATAGAGCCTTCCCTGTCGGGGAGACAAAGTGACAGGTGGTGCATCGGTTGTCGTCAGCTCGTGTCGCTTCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGGTGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGATCGTAAACTCTGTTATTAGGGAAGAACATATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAAGATCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGCGGTAATACGTAGGTGGCAGAGCGTTGATACATAGGGAATTATTGGGCGTAAGCGCGCGTAGGCGGTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAAGAGGAAAGTGGAATTCCATGTGTGTACCTCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTTCTGGTCTGTAAAACCTGACGCTGATGTGCGAAAACGTGGGGATCAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCTTAGTGCTGCGCAGCTAACGCATTAAGCACTCCGCCTGGGGGGAGTCACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAGCAACGCGAAGAACCTTACCAAATCTTGACATCCTGTCTTTGACAACTCTAGAGATAGAGCCTTCCCCTTCCTAGGCAAACAAAGTGACAGGTGCATGGTTGTCGTCAGCTCGGATTGCTTCTACGGGAGGCAGCAAGTGAGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGCTGTAAAACTCTGTTATTAGGGAAGAACATATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAAGATCTACAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGGTAATACGTAGGTGGCAAGCGTTATCCGGGGGAATTATTGGGCGTAAAGCGCGCGGTAGGCTTTTTAAGTCTGATGTGAAAGCCCACGCTCAACCGTGGAGGTGCTATTGGAAACTGGAAAACTTGAGTGCCAAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGGTGCGAAATGCGCAGAGATATGGAGTAACGAGTGGCGAAGGCGACTTTCTGGTCTTAACTGACGCTGATGTGCGAAAGCCGTGGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTCCGCCTTAGTGCGCAGCTAACGCATTAAGCACTCCGCCTGGGGGAGTACGACCGCAGGTTGAAACTCAAAGGAATTGACGGCACTGGAACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCGGACCGAAGAACCTTACCAAATCGTATTGACATCCTTTGACAACTCTAGAGATAGAGCCTTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGCTTCTACGGGAGGCAGCAGTAGGGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTATCATCGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACATATGTGTAATGACTGTGCACATCTTGGACGTGGACAGTGATAAAGCCGGCTATAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGCAGGCGTTATCCGGAAATTATTGGGCGTAAAGGGCGCGTAGGCATTTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGTGCCATTGGCGAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAGCGTGGGGAAGTCCAAACAGGATTAGATACCCTAGTAGTCCCACGGGCCTGACGATGAGTGCTAAGTGTGTTAGGGGGTTTCCGCCCCTGCCCTGTGCAGCTACGCATTATTAAGCACTCTCCGCCTGGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATGACGGGGACCCGCACAAGCGGTGGAGCCGTGGTTTAATTCGAAGCAACGCGAAGATCTATTACCAAATCTTGACATCCTTTGACAACTCTAGAGATAGAGCCTTCTCGTCGGGGACAAAGTGACAGGTGGTGCATGGTTGGTCAGTGCTTCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAAACTTGGGCGAAAGCCTGACGGAGCAAACGCCGCGTGAGTGATGAAGGTCTTCGTGGATCGTAAACTCTGTTATTAGGGAAGAACATATGTGTAAGTAACTGTGCACATCTTGACGGTACGGATCAGAAAGCCCACGGCTAACTTACGTGCCAGCAGCCAGCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTAAGTCTGATGTGAAGCCCCACGGCTCAACCGTGGAGGGTCGTGGAAACTGGAAGCTCTTGGCTAACTTGAAAGAGGAAAGTGGAATTCCATGTGTAGCGTGAAATGCGCAGAGATATGGCCCAGGAACACCAGTGGCGAGGCAACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGTGGGGATCGAACAGGATTGATACCCTGGTAGTCCACGGATCAACGATGAGTGCTAAGTGTTAGGGGTTTCGGCCCCTTAGTGCTGTGCAGCTAACGCATTAAGCGGCACTCCGCCTGGGGAGTACGACCGCAGATCCGAGTAAAGGAAGTCGCACGAAACTAGACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAATGCGAAGAACCTTACCAAATTATTACACCACATGATACGTTTATTTCC
When used: /home/opt/INC-Seq/inc-seq.py -i data.fa -o pass.out -a graphmap -m 500
Program stopped even faster:
---------- Processing read 31 ----------
Max number of segments found: 6
Number of segments of the candidate strech: 6
Candidate read found!
Traceback (most recent call last):
File "/home/opt/INC-Seq/inc-seq.py", line 160, in
The only time when INC-SEQ worked fine was with POA switch: /home/opt/INC-Seq/inc-seq.py -i data.fa -o pass.out -a poa -m 500
However, Graphmap and Blast crash every time.
Kind regards Szymon
Hi Szymon,
Do you have PBDAGCON in your path?
Chenhao
Chenhao,
Yes, we have: export PATH=/home/opt/pacb/bin:$PATH ls /home/opt/pacb/bin bam2bax ccmake gfortran h5dump h5repart pbindexdump toe bam2plx clear gif2h5 h5import h5stat pbmerge tput bam2sam cmake h52gif h5jam h5unjam pls2fasta tset bax2bam cpack h5c++ h5ls infocmp reset blasr cpp h5cc h5mkgrp infotocap samtools captoinfo ctest h5copy h5perf_serial loadPulses sawriter CC g++ h5debug h5redeploy ncurses6-config tabs ccache gcc h5diff h5repack pbindex tic
Do we need anything else?
Szymon
What if you run "pbdagcon" directly in your terminal? If you see an error about "command not found", you can try to clone https://github.com/PacificBiosciences/pbdagcon, compile it and add to your path.
Chenhao.
Hi Szymon,
At the meantime, I have also pushed a latest commit, which includes the binary of pbdagcon used for our manuscript. Can you test it out?
Chenhao.
Chenhao,
It does work now, specified it with: export PYENV_ROOT="/home/opt/.pyenv" export PATH="$PYENV_ROOT/bin:$PATH" eval "$(pyenv init -)" export PYTHONPATH=/home/opt/INC-Seq/utils:$PYTHONPATH export PATH=/home/opt/pacb/bin:$PATH export PATH=/home/opt/pbdagcon/src/cpp:$PATH /home/opt/INC-Seq/inc-seq.py --help
Analysed over 250 reads however, my output file is still empty - is that correct?
Looks like some of the data is being saved in logs folder.
Szymon
I am not sure about the "logs folder" you are referring to. Are you using bpipe to run INC-Seq?
I used basic command: /home/opt/INC-Seq/inc-seq.py -i pass.fasta -o pass.out
Program generated folder called logs with one file inside: file myeasylog.log
2016-06-07 00:47:04,498 INFO [default] Multi-threaded. Input: /dev/shm/incseq_pass.fasta_2016-06-07_00-46-47.733103/3c0475d65c347bb420808531b425e5ce.tmp.m5, Threads: 4 2016-06-07 00:47:04,500 DEBUG [Reader] [szymonome@becker.eng.gla.ac.uk] [FUNCTION] [FILE:0] Consensus candidate: 2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d_7 2016-06-07 00:47:04,501 INFO [Consensus] Consensus calling: 2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d_7 Alignments: 6 2016-06-07 00:47:10,218 INFO [default] Multi-threaded. Input: /dev/shm/incseq_pass.fasta_2016-06-07_00-46-47.733103/589b2c2a28cbbdb58343a9ebed958a67.tmp.m5, Threads: 4 2016-06-07 00:47:10,219 DEBUG [Reader] [szymonome@becker.eng.gla.ac.uk] [FUNCTION] [FILE:0] Consensus candidate: 0efc0fab-903e-4489-923f-b36e553bdd38_Basecall_2D_2d_6 2016-06-07 00:47:10,219 INFO [Consensus] Consensus calling: 0efc0fab-903e-4489-923f-b36e553bdd38_Basecall_2D_2d_6 Alignments: 5
However, pass.out is still empty.
Hi Szymon,
This does not seem to be the logging style of INC-Seq. It also looks weird to me that multi-threading was used. A typical run of the INC-Seq should produce something like the follows (with the read you sent me). Did you get any output if you run the test case in data/inc_seq_test_read.fa?
00:07:21|lich@n067|tmp$ ~/projects_backup/INCSeq/inc-seq.py -i tmp.fa ---------- Processing read 1 ---------- Max number of segments found: 7 Number of segments of the candidate strech: 7 Candidate read found! Consensus called 2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d Number of segments 7
2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d_7/0_763 GTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTGACATCCTTTGACAACTCTAGAGATAGAGCCTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGCTTCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACATATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAAGATGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCTTAGCGCAGCTAACGCATTAAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGA
Test file generated correct output:
/inc-seq.py -i inc_seq_test_read.fa -o out.fa ---------- Processing read 1 ---------- Max number of segments found: 20 Number of segments of the candidate strech: 20 Candidate read found! Warning: PBDAGCON timeout! Trimming 1 base(s). Consensus called ddfdd3f2-c50b-4843-b1f2-c1669785858a_Basecall_2D_2d Number of segments 20
So in case of ladder_rep_1 data I have to wait until all reads will be processed and then -out file will get updated? It looks like out file does not update itself automatically when consensus read was generated. Below consensus was called on read 25 but pass.out file is still empty.
---------- Processing read 25 ---------- Max number of segments found: 7 Number of segments of the candidate strech: 7 Candidate read found! Consensus called 2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d Number of segments 7 ---------- Processing read 26 ---------- Max number of segments found: 2 Not enough alignmets! Consensus construction failed! ---------- Processing read 27 ---------- Max number of segments found: 6 Number of segments of the candidate strech: 6 Candidate read found!
Regards Szymon
For cases like read 25 you showed, consensus should have been called. I think probably your system buffer has not been flushed such that nothing was written to your output file. You can test it with a few reads (probably a few long reads like read 25) first and see if you get the output fasta file.
Subsampled 100 reads and it looks like program works fine now. INC-Seq generated 3 concatemrised molecules. Thank you very much for your help!
I looked through INC-Seq code and found section where you use primer sequence for concatemer split. However, this part of code is not available for use at the moment. Are you still planning to use that mode in the future? Did you get any good results with that approach?
That is great!
Initially we thought we could detect the primer sequence from corrected reads to resolve the correct orientation. However, I found that the primer sequence could only be detected in very limited number of consensus sequences, possibly due to the lost of partial primer sequence during the library preparation. So I would not restore that function. If you are interested, you can checkout for the versions before commit "fc20bdc8cac50281c26f833f945d79ef81b0aa77", which should have the implementation of recovering the orientation based on primer sequences.
For 16S classification, the consensus reads could just be mapped to a reference database, e.g. SILVA with BLASTN to restore the correct orientation. Another trick (we used in the manuscript) is to concatenate the consensus twice, which should theoretically restore the correct orientation.
Hello,
Last week I was testing INC-Seq software with your data "ladder_rep_1" and after processing read 73 I got an error - please see below. It looks like software worked fine for the first 72 reads. Do you know why is it like that?
Regards Szymon
---------- Processing read 72 ---------- Max number of segments found: 0 Consensus construction failed! ---------- Processing read 73 ---------- Max number of segments found: 7 Number of segments of the candidate strech: 7 Candidate read found! Traceback (most recent call last): File "/home/opt/INC-Seq/inc-seq.py", line 160, in
sys.exit(main(sys.argv[1:]))
File "/home/opt/INC-Seq/inc-seq.py", line 147, in main
args.seg_cov, args.iterative)
File "/home/opt/INC-Seq/inc-seq.py", line 24, in callBuildConsensus
seg_cov, iterative)
File "/software/INC-Seq/utils/buildConsensus.py", line 288, in consensus_blastn
consensus = pbdagcon(tmpname+'.m5', 0)
File "/software/INC-Seq/utils/buildConsensus.py", line 206, in pbdagcon
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/home/opt/.pyenv/versions/2.7.8/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/home/opt/.pyenv/versions/2.7.8/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory