arogozhnikov / demuxalot

Reliable, scalable, efficient demultiplexing for single-cell RNA sequencing
MIT License
24 stars 3 forks source link

Parameterize `parse_read` for custom UMI-tags etc. #24

Closed mschilli87 closed 9 months ago

mschilli87 commented 9 months ago

While cellranger uses the UB SAM tag, other scRNA-seq analyses use different SAM tags (e.g. XM) for the same information. By parameterizing the existing parse_read function, this (and other read parsing/filtering options) can be adjusted by the caller without the need for boilerplate code required for a custom callback class. At the same time, by providing the previously hard-coded values as default parameters, compatibility with existing code is kept. All functions using this callback have been extended by optional keyword arguments to allow passing through the newly added parameters.

This addresses one out of two issues raised in https://github.com/herophilus/demuxalot/issues/23.


closes #26

arogozhnikov commented 9 months ago

So looking at this.

For user it is actually more readable to just overwrite handler, for example:

custom_parse_read = lambda read: parse_file(read, umi_tag="UB")

count_snps(... , parse_read=custom_parse_read)

rather than figuring out parameters for parse_read, and then passing corresponding kwargs.

This has advantage of being more direct, and IDE can suggest parameters and complain on arguments.

So I recommend just dropping kwargs :)

mschilli87 commented 9 months ago

@arogozhnikov: Great suggestion. As I said, my Python is a bit rusty so the only two options that occured to me where a class full custom, like you suggested in https://github.com/herophilus/demuxalot/issues/23#issuecomment-1945179338 or the kwargs. But I agree that the lambda is a much better solution. I'll adjust this PR (and #26) accordingly, after another round of testing.

mschilli87 commented 9 months ago

@arogozhnikov: I have made the adjustemts and ran both tests (cellranger-style + my data) again using the lambda appraoch suggested by you and obtained identical results as with the kwargs approach I used initially. If you merge the (updated) PR #26, both, this one and #25 should be closed automatically.

arogozhnikov commented 9 months ago

great, already looks good to me.

We're sorting out things with repo, this shouldn't take long