Closed ruochiz closed 1 year ago
Thanks for the input! Would you be able to open a PR for this?
On Jun 25, 2023, at 1:16 PM, ruochiz @.***> wrote:
Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the chunk_barcoded_bam.py part, and I found possible solutions to improve it.
def getBarcode(read, tag_get): ''' Parse out the barcode per-read '''
try: read.get_tag(barcodeTag, tag_get) except: return ("AA")
This improves the speed from ~100k records/s -> 130k records/s
— Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/73, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYOWVXWRSYBGB5H4S6DXNCMARANCNFSM6AAAAAAZTK7Z5Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>
now implemented in v0.6.8. Thank you very much @ruochiz for the contribution. You should be able to pip
install the latest version of the software now.
Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the
chunk_barcoded_bam.py
part, and I found possible solutions to improve it.bc = set([x.strip() for x in content])
which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s)This improves the speed from ~100k records/s -> 130k records/s