biocore / improved-octo-waddle

Balanced parentheses succinct data structure in Python
BSD 3-Clause "New" or "Revised" License
6 stars 7 forks source link

[Question] should n and nm be supported for placements? #35

Open antgonza opened 2 years ago

antgonza commented 2 years ago

While running some tests with the SEPP placements in Qiita we noticed that the current code rewrites the placements as:

plcmnts['placements'].extend([{'p': placement, 'nm': [[sequence, 1]]}
                           for sequence, placement in placements.items()])

which is currently not supported by bp so we had to rewrite them as:

plcmnts['placements'].extend([{'p': placement, 'n': [sequence, ]}
                           for sequence, placement in placements.items()])

Thus, wondering if this is enough or if there is any reason why bp should support both.

cc: @sjanssen2; BTW the current version can add all the Qiita 150bp deblur fragments into the GG/SEPP backbone in < 3.5hrs and ~200Gb.