ArtRand / signalAlign

HMM-HDP models for MinION signal alignments
MIT License
45 stars 12 forks source link

signalAlign metrichor version support #2

Closed athulmenon closed 7 years ago

athulmenon commented 8 years ago

I have used metrichor 2.38 for base calling. currently signalAlign is not supporting 2.38 version. Please see below error

signalAlign - indexing reference, done Unsupported Version (1.15.0 and 1.19.0 supported)

How I can run signalAlign for latest metrichor data.

ArtRand commented 8 years ago

Hello, Which dragonet version are you using? You can find it in "/Analyses/Basecall_2D_000/" attributes. Are you using the new R9 kits?

If you are, I'm working on updating to R9 chemistry. The models are different so it's a little more involved. Should be ready in ~1 week.

athulmenon commented 8 years ago

Thanks for the reply. We are using R7 chemistry and dragonet version is 1.15.0

./runSignalAlign --file_directory /2D_Reads/pass/ --ref reference.fasta -o outputdirectory

signalAlign - indexing reference, done File "./runSignalAlign", line 227, in <module> sys.exit(main(sys.argv)) File "./runSignalAlign", line 209, in main alignment.run() File "/data1/ngs/programs/signalAlign/bin/signalAlignLib.py", line 796, in run twod_read_path=temp_2d_read) File "/data1/ngs/programs/signalAlign/bin/signalAlignLib.py", line 74, in get_npRead_2dseq_and_models if npRead.get_twoD_event_map() and npRead.get_template_events() and npRead.get_complement_events(): File "/data1/ngs/programs/signalAlign/bin/signalAlignLib.py", line 507, in get_twoD_event_map twoD_init = self.initialize_twoD() File "/data1/ngs/programs/signalAlign/bin/signalAlignLib.py", line 405, in initialize_twoD if self.fastFive["/Analyses/Basecall_2D"].attrs["dragonet version"] == "1.15.0": File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-vCi1L7/h5py/h5py/_objects.c:2453) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-vCi1L7/h5py/h5py/_objects.c:2410) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 164, in __getitem__ oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-vCi1L7/h5py/h5py/_objects.c:2453) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-vCi1L7/h5py/h5py/_objects.c:2410) File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-build-vCi1L7/h5py/h5py/h5o.c:3502) KeyError: "Unable to open object (Object 'basecall_2d' doesn't exist)"

ArtRand commented 8 years ago

You might have an older version then I had available when developing. Would you mind sending me the offending file? Maybe it is a simple fix. If the R7 chemistry uses a very different model than the R7.3 then I'll have to send you instructions on how to build one. Email: arand@ucsc.edu

Thanks!

athulmenon commented 8 years ago

Hi,

I send you the offending files to your mail. please check it and help me to solve this issue.

Thanks.

sathishkumarRamaswamy commented 8 years ago

Dear Rand I am also getting the same error while running following command

Command ./runSignalAlign -d /pass/ -r reference.fasta -o runsignal_output/\

Error Unsupported Version (1.15.0 and 1.19.0 supported) problem making npRead for pass/minion_ecolistrain5_5_16_1340_1_ch343_read403_strand.fast5

I am looking forward from you.

Thanks and Regards Sathish

genmax commented 8 years ago

Hi,

Please see below signalAlign error. I am not getting GOMP_4.0 installation file. Kindly send me the GOMP_4.0 installation file. ./signalMachine: /usr/lib/x86_64-linux-gnu/libgomp.so.1: version `GOMP_4.0' not found (required by ./signalMachine)

Command used : /programs/ArtRand/signalAlign/bin$ ./runSignalAlign -d /programs/ArtRand/signalAlign/tests/minion_test_reads/sp -r /tests/test_sequences/sp.fasta -o /Methylation_analysis/

Thank you. Kushal

ArtRand commented 8 years ago

Hello all,

The patch to fix the Metrichor version will be out today. Sorry about the delay, I have been trying to refactor the codebase for the R9 chemistry.

As for the GOMP_4.0 issue: This is likely a problem with your local c library not having the openMP headers. You could remove the -fopennp flag from include.mk in the /signalAlign/sonLib directory to fix this problem. The main slow down will be to the HDP building, but if you're only performing alignments it should hardly change performance.

sathishkumarRamaswamy commented 8 years ago

Hi Rand,

Thanks a lot. I have solved GOMP_4 issue. Now it's working for test ecoli data. I have followed the manual, I run the signalAlign commands and I got the result. I have normal output (tab separated tsv files). But I am interested in methylated region, Further, how I should process using signalalign output.

Thanks Sathish

ArtRand commented 8 years ago

Metrichor version 1.20.0 should work now. Make a fresh pull from branch 'master'. Let me know if there are any other problems with R7.3 data.

As far as how to process the methylation data; there is a script /signalAlign/scripts/call_methylation.py designed for this purpose. There is some explanation here: https://github.com/ArtRand/CytosineMethylationAnalysis. I'll write up a step-by-step tutorial on methylation calling and let you know when it's ready/where to find it.

sathishkumarRamaswamy commented 8 years ago

Thanks a lot. Now SignalAlign is working for our data.

If you provide step by step analysis, it will be useful for users.

Best of luck Sathish

ArtRand commented 7 years ago

Hello sorry about the slow response, I'm in the middle of writing a tutorial on calling methylation (and refactoring the model to work with R9).

The position file is a tab-separated file with the format X \t position_0 \t position_1 \t ... position_N \n X \t position_0 \t position_1 \t ... position_N \n The first line has the positions for the template stand (the strand in the FASTA reference sequence). The second line has the positions for the complement strand (note, NOT the reverse complement). You can find an example here (https://github.com/ArtRand/CytosineMethylationAnalysis/blob/master/data/ecoli/test_sites.tsv)

Here are some python functions you can use for generating a CCWGG substitute file, for example:

def make_CCGG_substitute_file(starts, seq, out_file):
    def check_starts(starts, seq, target):
        for start in starts:
            assert seq[start] == target
        return True
    if os.path.exists(out_file):
        print "Outfile you're trying to make already exists"
        return
    fH = open(out_file, 'w')
    if check_starts(starts, seq, "C"):
        fH.write("X\t")
        for start in starts:
            fH.write("{}\t".format(start))
    fH.write("\n")

    # + 2 because the starts are for the second C in CCWGG and on the complement we want to mark the the C in the motif 
    rc_starts = [x + 2 for x in starts] 
    if check_starts(rc_starts, seq, "G"):
        fH.write("X\t")
        for start in rc_starts:
            fH.write("{}\t".format(start))
    fH.write("\n")
    fH.close()

# this function just marks all of the base pairs in starts
def make_substitute_file(starts, seq, out_file):
    if os.path.exists(out_file):
        print "Outfile you're trying to make already exists"
        return
    fH = open(out_file, 'w')

    fH.write("X\t")
    for start in starts:
        fH.write("{}\t".format(start))
    fH.write("\n")

    fH.write("X\t")
    for start in starts:
        fH.write("{}\t".format(start))
    fH.write("\n")

    fH.close()

# makes a motif file that is passed to SignalAlign with the -q flag
def make_motif_file(starts, seq, outfile):
    if os.path.exists(outfile):
        print "Outfile you're trying to make already exists"
        return
    fH = open(outfile, 'w')
    for start in starts:
        s = start - 5
        e = start + 6
        q = seq[s:e]
        fH.write("{start}\t{end}\t{motif}\n".format(start=s, end=e, motif=q))

Also keep in mind that call_methylation.py requires output from SignalAlign in the sparse format using the -s flag.

ArtRand commented 7 years ago

The manual shows an example of how to run SignalAlign to generate methylation calls on positions in a genome: https://github.com/ArtRand/signalAlign/blob/master/Manual.md#input

ArtRand commented 7 years ago

Hello Sorry about the delay. Almost ready with the tutorial. The columns are: site \t strand \t pA \t pC \t pG \t pT \t readID (if you're running the 'variant' option) site \t strand \t pC \t pmC \t phmC \t readID (if you're running the 'threeWay' option)

You need to pass the signalAlign output files as: ../path/to/output/*.tsv

It will all be cleaned up and easier to use v. soon.

Thanks for your patience.

A