frisen-lab / TREX

Simultaneous lineage TRacking and EXpression profiling of single cells using RNA-seq
MIT License
5 stars 6 forks source link

Allow "partial" matches to the exclusion list #72

Closed marcelm closed 1 month ago

marcelm commented 1 month ago

This fixes a regression introduced in PR #71: If the clone ID that we look up in the exclusion list contains "-" or "0" characters, we need to check only the other characters. This is now handled by introducing a SimilaritySet class that uses exact lookups as long as there are no "-" or "0", but falls back to the old algorithm otherwise.

This adds a minute or so to processing the exclusion list, so it is slower, overall still much better than runtime on the order of days.