chengyuan / reago-1.1

19 stars 12 forks source link

collapse_graph issue #3

Open AdamSorrel opened 8 years ago

AdamSorrel commented 8 years ago

Hi there,

I was trying to run reago1.1 on my mock community and to got an issue.

(python2.7conda)stovicek_lab@StovicekLab:~/Projects/MockAssemblyReago$ python ~/Bioinformatics/reago-1.1/reago.py filter_out/filtered.fasta sample_out -l 101

Mon Dec  7 12:17:25 2015 REAGO (v1.10) started...
Input file: filter_out/filtered.fasta
Parameters:
-e 0.05
-f 1350
-b 10
-l 101.0
-o 0.7
-t 30
Mon Dec  7 12:17:25 2015 Reading input file...
Mon Dec  7 12:17:25 2015 Initializing overlap graph...
Mon Dec  7 12:17:25 2015 Recovering 16S rRNAs...
Traceback (most recent call last):
  File "/home/stovicek_lab/Bioinformatics/reago-1.1/reago.py", line 831, in <module>
    subgraph = collapse_graph(subgraph, [])
  File "/home/stovicek_lab/Bioinformatics/reago-1.1/reago.py", line 298, in collapse_graph
    offset = len(read_db[predecessor]) - overlap_to_predecessor
KeyError: '2364.2|1838.2|808.2|812.2|2966.2|3314.1|3394.2|3186.2|706.2|1885.2|2253.2|1018.2|929.1|194.2|1626.1|687.1|3226.1|2800.2|2732.2|34.1|1346.1|596.2|2045.2|2828.2|2783.2|1891.2|2610.2|568.2|995.2|919.1|2445.2|3005.1|2421.2|2526.2|365.2|1700.1|432.2|3316.1|13645.2|2246.2|3123.2|1376.2|847.2|2449.2|3206.2|2532.2|2194.1|2121.2|3121.2|308.2|904.2|1406.2|1130.2|370.2|262.2|2027.2|460.2|2434.2'

I have played around a little, and figured that changing the length of the sequence (the -l flag) the issue goes away with values larger then 110, but my sequences are all exactly 100 bp long.

Attached is my input file, should you want to run the thing yourself.

filtered.fasta.zip

chengyuan commented 8 years ago

Adam,

It may related to an known issue of readjoiner. I came out with a workaround of it a while ago. See the attachment. Could you try it and let me know whether the issue goes away?

Thanks, Cheng

On Sun, Dec 6, 2015 at 2:28 AM, AdamSorrel notifications@github.com wrote:

Hi there,

I was trying to run reago1.1 on my mock community and to got an issue.

(python2.7conda)stovicek_lab@StovicekLab:~/Projects/MockAssemblyReago$ python ~/Bioinformatics/reago-1.1/reago.py filter_out/filtered.fasta sample_out -l 101

Mon Dec 7 12:17:25 2015 REAGO (v1.10) started... Input file: filter_out/filtered.fasta Parameters: -e 0.05 -f 1350 -b 10 -l 101.0 -o 0.7 -t 30 Mon Dec 7 12:17:25 2015 Reading input file... Mon Dec 7 12:17:25 2015 Initializing overlap graph... Mon Dec 7 12:17:25 2015 Recovering 16S rRNAs... Traceback (most recent call last): File "/home/stovicek_lab/Bioinformatics/reago-1.1/reago.py", line 831, in subgraph = collapse_graph(subgraph, []) File "/home/stovicek_lab/Bioinformatics/reago-1.1/reago.py", line 298, in collapse_graph offset = len(read_db[predecessor]) - overlap_to_predecessor KeyError: '2364.2|1838.2|808.2|812.2|2966.2|3314.1|3394.2|3186.2|706.2|1885.2|2253.2|1018.2|929.1|194.2|1626.1|687.1|3226.1|2800.2|2732.2|34.1|1346.1|596.2|2045.2|2828.2|2783.2|1891.2|2610.2|568.2|995.2|919.1|2445.2|3005.1|2421.2|2526.2|365.2|1700.1|432.2|3316.1|13645.2|2246.2|3123.2|1376.2|847.2|2449.2|3206.2|2532.2|2194.1|2121.2|3121.2|308.2|904.2|1406.2|1130.2|370.2|262.2|2027.2|460.2|2434.2'

I have played around a little, and figured that changing the length of the sequence (the -l flag) the issue goes away with values larger then 110, but my sequences are all exactly 100 bp long.

Attached is my input file, should you want to run the thing yourself.

filtered.fasta.zip https://github.com/chengyuan/reago-1.1/files/53120/filtered.fasta.zip

— Reply to this email directly or view it on GitHub https://github.com/chengyuan/reago-1.1/issues/3.

AdamSorrel commented 8 years ago

I might be a bit slow in the morning, but I don't see any attachment. Are you sure you added it?