CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

group with `--per-contig` unnecessarily requires gene tags #568

Closed mfansler closed 1 year ago

mfansler commented 1 year ago

The --per-contig flag requires the --per-gene flag, and if outputting a TSV, ends up in this branch of the code:

https://github.com/CGATOxford/UMI-tools/blob/a5b348993cd220c32a524cf77155c8e42c89a3e1/umi_tools/group.py#L287-L291

This results in an error if there are no gene tags. I believe this is unnecessary because the --per-contig flag already entails that gene should be equated to the contig. Namely, we could instead handle the case like:

  if options.tsv:
    if options.per_gene:
      if options.per_contig:
        gene = read.reference_name
      else:
        gene = read.get_tag(gene_tag)
    else:
      gene = "NA"
IanSudbery commented 1 year ago

I agree. @TomSmithCGAT ?

TomSmithCGAT commented 1 year ago

Completely agree. From the commits, it looks this code section was not updated when the --per-contig option was added.

Thanks very much @mfansler. Do you want to issue a PR so your contribution is properly attributed?

mfansler commented 1 year ago

@TomSmithCGAT sounds good. Yes, happy to put a PR together.

akmorrow13 commented 1 year ago

fixed with https://github.com/CGATOxford/UMI-tools/pull/577

mfansler commented 1 year ago

@akmorrow13 thanks! Lost track of this one 😬