BIMSBbioinfo / scregseg

Single-cell regulatory landscape segmentation
GNU General Public License v3.0
5 stars 2 forks source link

New option for function make tile: keep_chroms #1

Closed prauten closed 2 years ago

prauten commented 2 years ago

I added a new optional option to the make_tile function that allows you to select those chromosomes from which you want to keep the fragments. If this option is enabled, the remove_chroms option is ignored. If keep_chroms is not used as an argument, the function runs as before.

Unfortunately, opening the files in PyCharm seems to have changed some of the indentation and spaces, too.

I read the contributing guidelines but could not comply with all of them yet.

For reference, they are:

  1. Include passing tests (run tox);
  2. Update documentation when there's new API, functionality etc.
  3. Add a note to CHANGELOG.rst about the changes.
  4. Add yourself to AUTHORS.rst.

Regarding 1: I ran tox, not all tests succeeded, but this is not due to the novel added functionality but must stem from previous issues.

Regarding 2: I am happy to add a line about this in the jupyter notebook if you want to.

Regarding 3: The CHANGELOG.rst is currently not up to date, such that I disregarded this for now.

Regarding 4: Done.

wkopp commented 2 years ago

Hi Pia, thank you for your contribution and request. In general, I'm fine with adding this functionality.

I'd have two small requests:

  1. Could you add the additional argument in the docstring of the function so that it's clear what input is expected. From what I understand, I'd assume keep_chroms: None or list(str), right?
  2. There is a bit of repetitive code by using if keep_chrs is not None: ... else ... now. In general, it's better to keep repetitive code to a minimum amount, if possible. Here, I think, this would be possible by simply making the selection of the keep_chroms before the for chrom in genomesize: loop on the genomesize dictionary. That is by putting genomesize = {chrom: genomesize[chrom] for chrom in keep_chroms} if keep_chroms is not None before the loop. That way the rest of the function would stay the same and be a bit slimmer. Would you mind adapting it that way? Thank you very much!

Best, Wolfgang

prauten commented 2 years ago

Hi Wolfgang,

Thank you very much for the quick review and for agreeing with the addition of this feature.

  1. Done!
  2. Great suggestion! I changed it the way you proposed which slims it down a lot.

Just one more thing I noticed, the docstring and default of the make_counting_bins function option remove_chroms is an empty list. However, the way the command line parsing is set up, this will be ['chrM', 'chrY', 'chrX'] unless you specify the remove_chroms argument without input on the command line. Maybe this is what you want, but it is not intuitive for me that I have to add --remove_chroms NOTHING, in case I don't want to filter any chromosomes on the command line.

Best, Pia

wkopp commented 2 years ago

Thank you! Regarding the default parameters. As long as the defaults are documented at all levels, I'd say it's fine. Even if they differ. True, you might need to specify a dummy chromosome to prevent the filtering from the command line. However, I never really needed to do that previously (e.g. to keep mitochondrial reads). In my experience, more often than not, you would want to exclude some chromosomes/contigs.