ArtRand / signalAlign

HMM-HDP models for MinION signal alignments
MIT License
45 stars 12 forks source link

methylation calling #4

Open al-mcintyre opened 7 years ago

al-mcintyre commented 7 years ago

Hi Art,

I'm trying to call methylation sites in the ecoli test data but there are no results in the output directory. I took out the --twoWay flag, since it was causing an error (& deprecated in favour of -x twoWay?), but otherwise the command is as in the manual:

./runSignalAlign -d ../tests/minion_test_reads/ecoli/ -r ../tests/test_sequences/E.coli_K12.fasta -T ../../HDP_models/ecoli_models/template_trained.hmm -C ../../HDP_models/ecoli_models/complement_trained.hmm -tH ../../HDP_models/ecoli_models/template.multisetPrior2.nhdp -cH ../../HDP_models/ecoli_models/complement.multisetPrior2.nhdp -smt=threeStateHdp -q ../tests/test_regions/test_sites_bal_1.tgt -p ../tests/test_regions/test_labels_bal_1.tsv -x twoWay -s -o ../../ 2> ../../a.err

It seems to run, with the following a.err:

Got 11 files to align to

signalAlign - finished alignments

But the ../../tempFiles_alignment/ directory then contains only the reference files and fastas in directories labelled tempFiles_readname. What type of output should this command be producing or is there another way of calling methylated sites?

Thanks

ArtRand commented 7 years ago

Hello,

Is it possible that you do not have BWA in your $PATH? If you un-comment lines 207 and 208 in runSignalAlign (https://github.com/ArtRand/signalAlign/blob/master/scripts/runSignalAlign.py#L207, then rebuild) it should give you the errors from the individual jobs to stderr.

I am nearly done with a major refactor to support R9 chemistry and methylation calling. The code is still backwards compatible, so you can run your R7.3 samples. In general the methylation calling workflow is more intuitive. I should have the master repo updated in the next 2 days. If the BWA bug isn't the problem or the traceback looks daunting, would you mind waiting until that update?

Thanks,

Art

al-mcintyre commented 7 years ago

Thank you! The error was in opening the HDP files while they were still .bz2 compressed. It's now working but how do I interpret the stdout? eg.

makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5 1317 21213(60.881998) 35350(31.660826)

Are the methylation calls recorded at all in the .tsv files from signalAlign or do you have to run the call_methylation.py script? What would be the appropriate command for the example in the manual?

JohnUrban commented 7 years ago

@al-mcintyre did you ever figure this out?

ArtRand commented 7 years ago

Hello John and Al,

The output is file_name\t#template_aligned_pairs(template_score)\tcomplement_aligned_pairs(complement_score)

The score is the average posterior probability of the aligned pairs, ignoring gaps.

PengNi commented 6 years ago

hi, I unziped the .bz2 files and ran the example command but I still can't find the output file of methylated sites. And I have two more questions:

  1. can the models in HDP_models be used to call the methylated sites by using nanopore sequencing data of other species?
  2. the -p command seems to be disabled. I don't understand it.
PengNi commented 6 years ago

more, I un-commented lines 213 and 214 in runSignalAlign, and ran it again, I got the following errors:

signalAlign - indexing reference
signalAlign - indexing reference, done
[SignalAlign::run]Starting on ../tests/minion_test_reads/ecoli/makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5
Traceback (most recent call last):
  File "./runSignalAlign", line 232, in <module>
    sys.exit(main(sys.argv))
  File "./runSignalAlign", line 214, in main
    alignment.run()
  File "/home/nipeng/tools/signalAlign/bin/signalAlignLib.py", line 1179, in run
    ok, version, pop_1 = prepareOneD(fast5=self.in_fast5, npRead_path=temp_npRead, oneD_read_path=read_fasta)
  File "/home/nipeng/tools/signalAlign/bin/signalAlignLib.py", line 73, in prepareOneD
    npRead    = NanoporeRead(fast5, False)
  File "/home/nipeng/tools/signalAlign/bin/signalAlignLib.py", line 559, in __init__
    self.initialize()
  File "/home/nipeng/tools/signalAlign/bin/signalAlignLib.py", line 583, in initialize
    self.version = self.fastFive[oneD_root_address].attrs["dragonet version"]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2798)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 169, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2798)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-nCYoKW-build/h5py/h5o.c:3734)
KeyError: "Unable to open object (Object 'basecall_1d_00-1' doesn't exist)"

@ArtRand Did I miss something when installing or using this tool?

PengNi commented 6 years ago

I checked the source code of runSignalAlign again, found the "--2d" option. Now it works.

leosanbu commented 6 years ago

Hi @ArtRand, Following this thread, I am trying to run runSignalAlign for some nanopore data but inside the tempFiles_alignment I can only see the bwa index files, some tempFiles_readname.fast5 folders and two files ending in X.backward.txt and X.forward.txt. I am running it in the most simple way, just ./runSignalAlign -d /folder/to/fast5 -r reference.fna -o outputdir I tried adding --2d as @PengNi, and I have BWA in my path (as you suggested in a previous message). I cannot see the methylation calls, am I missing anything? Thanks in advance for your help

forrest1988 commented 6 years ago

Hi @ArtRand I am trying to run the program using the test Ecoli data but I am getting empty output files. So far without success. I am using command:

./runSignalAlign -d ../tests/minion_test_reads/ecoli/ -r ../tests/test_sequences/E.coli_K12.fasta -T ../../HDP_models/ecoli_r7.3_models/template_trained.hmm -C ../../HDP_models/ecoli_r7.3_models/complement_trained.hmm -tH ../../HDP_models/ecoli_r7.3_models/template.multisetPrior2.nhdp -cH ../../HDP_models/ecoli_r7.3_models/complement.multisetPrior2.nhdp -x cytosine2 -smt=threeStateHdp -q ../tests/test_regions/test_sites_bal_1.tgt -f variantCaller -o ../methCall_ecoli/ --2d 2> ../methCall_ecoli/a.err

this is based on the manual and provided HDP_models. I also took into account earlier advice from this thread: un-commented lines 213 and 214 from runSignalAlign.py and subsequently added "--2d" flag. I have also un-commented "-p" flag, as for whatever reason it was disabled in the code, but it didn't change anything. I was trying to run everything with BWA in versions 0.5.9 and 0.7.12, always in Path.

Still, in best case scenario the output files named for example "makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.sm3Hdp.tsv" are empty (0 Bytes). I am attaching the stderr file: a.txt

I really wish to use your program @ArtRand , do you have any advice?

hd00ljy commented 5 years ago

Hi @ArtRand I also got no result files after running the test command.

1) I removed -p option since it is commented in the python scripts 2) I removed options that use files under HDP_models directory since there isn't HDP_models in git-cloned directory

find | grep HDP_models


So the final test command I ran is as follows

cd signalAlign/bin

./runSignalAlign \ -d ../tests/minion_test_reads/ecoli/ \ -r ../tests/test_sequences/E.coli_K12.fasta \ -x cytosine2 \ -smt=threeStateHdp \ -q ../tests/test_regions/test_sites_bal_1.tgt \ -o ../../ \ 2> ../../a.err


bwa is in the PATH and the version is 0.7.17-r1194-dirty


########################################### the sizes of output files I could obtain were like these : fast5 files under temp directories are empty ##########################################

-rw-rw-r-- 1 hd00ljy hd00ljy 856 Dec 5 11:37 ../../a.err -rw-rw-r-- 1 hd00ljy hd00ljy 4.5M Dec 5 11:37 ../../tempFiles_alignment/gi_ecoli.X.backward.txt -rw-rw-r-- 1 hd00ljy hd00ljy 4.5M Dec 5 11:37 ../../tempFiles_alignment/gi_ecoli.X.forward.txt -rw-rw-r-- 1 hd00ljy hd00ljy 12 Dec 5 11:37 ../../tempFiles_alignment/temp_bwaIndex.amb -rw-rw-r-- 1 hd00ljy hd00ljy 43 Dec 5 11:37 ../../tempFiles_alignment/temp_bwaIndex.ann -rw-rw-r-- 1 hd00ljy hd00ljy 4.5M Dec 5 11:37 ../../tempFiles_alignment/temp_bwaIndex.bwt -rw-rw-r-- 1 hd00ljy hd00ljy 1.2M Dec 5 11:37 ../../tempFiles_alignment/temp_bwaIndex.pac -rw-rw-r-- 1 hd00ljy hd00ljy 2.3M Dec 5 11:37 ../../tempFiles_alignment/temp_bwaIndex.sa drwxrwxr-x 2 hd00ljy hd00ljy 4.0K Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5 -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5/temp_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5.npRead -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5/temp_seq_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5.fa drwxrwxr-x 2 hd00ljy hd00ljy 4.0K Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5 -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5/temp_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5.npRead -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5/temp_seq_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5.fa drwxrwxr-x 2 hd00ljy hd00ljy 4.0K Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5 -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5/temp_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5.npRead -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5/temp_seq_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5.fa drwxrwxr-x 2 hd00ljy hd00ljy 4.0K Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5 -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5/temp_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5.npRead -rw-rw-r-- 1 hd00ljy hd00ljy 0 Dec 5 11:37 ../../tempFiles_alignment/tempFiles_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5/temp_seq_makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5.fa


########################################## These are the outputs when I ran the test command ##########################################

############### stdout ################

Starting Signal Align

Aligning files from: ../tests/minion_test_reads/ecoli/

Aligning to reference: ../tests/test_sequences/E.coli_K12.fasta

Aligning maximum of 500 files

Using model: threeStateHdp

Using banding: True

Aligning to regions in: ../tests/test_regions/test_sites_bal_1.tgt

Non-default template HMM: None

Non-default complement HMM: None

Template HDP: None

Complement HDP: None

[runSignalAlign]:NOTICE: Got 11 files to align

############### stderr : ../../a.err ################

Command Line: ./runSignalAlign -d ../tests/minion_test_reads/ecoli/ -r ../tests/test_sequences/E.coli_K12.fasta -x cytosine2 -smt=threeStateHdp -q ../tests/test_regions/test_sites_bal_1.tgt -o ../../

signalAlign - indexing reference signalAlign - indexing reference, done [SignalAlign::run]Starting on ../tests/minion_test_reads/ecoli/makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch454_file165_strand.fast5 [SignalAlign::run]Starting on ../tests/minion_test_reads/ecoli/makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch228_file112_strand.fast5 [SignalAlign::run]Starting on ../tests/minion_test_reads/ecoli/makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch412_file504_strand.fast5 [SignalAlign::run]Starting on ../tests/minion_test_reads/ecoli/makeson_PC_MA_286_R7.3_gEcoli_10_26_15_2807_1_ch492_file245_strand.fast5

signalAlign - finished alignments


I hope it will be resolved soon