Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
122 stars 23 forks source link

Issue about MotifSeq.py. #48

Open Goatofmountain opened 3 years ago

Goatofmountain commented 3 years ago

Hi James, I ran the MotifSeq.py in the example dir of SquiggleKit like this: python ../MotifSeq.py -p test.fast5 -m CATCTATCCAGGGTTAAATT.model > test_kmer.tsv And it comes out with the error below: `**

Traceback (most recent call last): File "../MotifSeq.py", line 520, in main() File "../MotifSeq.py", line 153, in main model, m_order, L = read_bait_model(args.model) File "../MotifSeq.py", line 420, in read_bait_model L = int(l[1]) IndexError: list index out of range`

I've tried other ways to seak motif, but it comes out with similar errors. I guess this may be a problem in code assignments.

By the way, I read the script of MotifSeq.py this afternoon and I can not understand how local dtw works in the motif finding process (in the function "get_region_multi"). Is this function aligned the simulated signal of each base in the motif sequence to a segment of the original signal? If so, how can I choose the best alignment location of the motif in the raw signal?

Psy-Fer commented 3 years ago

Hello,

Here are a few suggestions to help.

The -p command is for a top path, not an individual file. So if trying to use the example fast5 file, point it to the ./example/ folder.

The -m flag is not for that kind of model, though I can understand how that could be confusing (my bad). Instead, use the CATCTATCCAGGGTTAAATT.fa file with the -i flag. This takes the sequence you give it, and will convert it into a model file, and use it with the DTW to find the best hit.

The DTW method will return the best hit only for that motif, for each read it is used against, along with some metrics if using the med-scaling. I assume from your comment this is what you are looking for?

Let me know if you have any other questions or can't get it to work with my suggestions above.

James