Open pclavell opened 5 months ago
Can you include a snippet of your gencodev44_transcript_map.tsv
file?
It is tab separated
ENSG00000290825.1 ENST00000456328.2 ENSG00000223972.6 ENST00000450305.2 ENSG00000227232.5 ENST00000488147.1 ENSG00000278267.1 ENST00000619216.1 ENSG00000243485.5 ENST00000473358.1 ENSG00000243485.5 ENST00000469289.1 ENSG00000284332.1 ENST00000607096.1 ENSG00000237613.2 ENST00000417324.1 ENSG00000237613.2 ENST00000461467.1 ENSG00000268020.3 ENST00000606857.1 ENSG00000290826.1 ENST00000642116.1
Ah, I see what's happened here. #577 fixed an issue with group
but didn't cover the --gene-transcript-map
use case, for which the implications of the fix were not clear to see, and we don't have tests to cover that option either so it wasn't picked up! 🤦
I'll try an issue a patch today/tomorrow.
Note to self: Add switch back to using read tag for gene id when using tx2gene map here: https://github.com/CGATOxford/UMI-tools/blame/9ce3a70a8b35ff9a066d73716680136be71cc70d/umi_tools/group.py#L289-L292. Also add a test to cover!
@pclavell - Could you please try installing the ts_debug_issue646
branch to check this resolves the issue. You can install with e.g pip install https://github.com/CGATOxford/UMI-tools/archive/ts_debug_issue646.zip
Any update on this?
I'm sorry I missed the last comment. I just ran it with version 1.0.0. This step is now buried in the middle of a snakemake pipeline full of temporary intermediate files and the inputs have been archived so testing this would mean that everything had to be recovered and rerun. If you really need it to be tested I could try doing it in the future weeks, but I am a little bit swamped atm. Thank you
Hello, I run this code with UMI-tools 1.0.0 to deduplicate based on UMI+gene mapping (but mapping to a pantranscriptome with several transcripts/gene) and it worked: umi_tools group \ --method adjacency \ --edit-distance-threshold=$EDIT_DISTANCE \ --per-contig \ --per-gene \ --gene-transcript-map gencodev44_transcript_map.tsv \ -I $QUERY \ --group-out "$NAME"_percontig.tsv \ --log "$NAME"_percontig.log
The output in group-out was showing in the gene column the geneID but now it only repeats the transcriptID EDIT: I've just installed version 1.0.0 and it works using exactly the same code and inputs, so there is a problem between 1.0.0 and 1.1.5