Open pclavell opened 1 month ago
Can you include a snippet of your gencodev44_transcript_map.tsv
file?
It is tab separated
ENSG00000290825.1 ENST00000456328.2 ENSG00000223972.6 ENST00000450305.2 ENSG00000227232.5 ENST00000488147.1 ENSG00000278267.1 ENST00000619216.1 ENSG00000243485.5 ENST00000473358.1 ENSG00000243485.5 ENST00000469289.1 ENSG00000284332.1 ENST00000607096.1 ENSG00000237613.2 ENST00000417324.1 ENSG00000237613.2 ENST00000461467.1 ENSG00000268020.3 ENST00000606857.1 ENSG00000290826.1 ENST00000642116.1
Ah, I see what's happened here. #577 fixed an issue with group
but didn't cover the --gene-transcript-map
use case, for which the implications of the fix were not clear to see, and we don't have tests to cover that option either so it wasn't picked up! 🤦
I'll try an issue a patch today/tomorrow.
Note to self: Add switch back to using read tag for gene id when using tx2gene map here: https://github.com/CGATOxford/UMI-tools/blame/9ce3a70a8b35ff9a066d73716680136be71cc70d/umi_tools/group.py#L289-L292. Also add a test to cover!
@pclavell - Could you please try installing the ts_debug_issue646
branch to check this resolves the issue. You can install with e.g pip install https://github.com/CGATOxford/UMI-tools/archive/ts_debug_issue646.zip
Hello, I run this code with UMI-tools 1.0.0 to deduplicate based on UMI+gene mapping (but mapping to a pantranscriptome with several transcripts/gene) and it worked: umi_tools group \ --method adjacency \ --edit-distance-threshold=$EDIT_DISTANCE \ --per-contig \ --per-gene \ --gene-transcript-map gencodev44_transcript_map.tsv \ -I $QUERY \ --group-out "$NAME"_percontig.tsv \ --log "$NAME"_percontig.log
The output in group-out was showing in the gene column the geneID but now it only repeats the transcriptID EDIT: I've just installed version 1.0.0 and it works using exactly the same code and inputs, so there is a problem between 1.0.0 and 1.1.5