dpeerlab / spectra

Supervised Pathway DEConvolution of InTerpretable Gene ProgRAms
MIT License
134 stars 17 forks source link

Util function improvements #25

Closed srose89 closed 1 year ago

srose89 commented 1 year ago

In check_gene_set_dictionary there will not be any error or flags if the length of the keys is the same between anndata and gene set dictionary but some of them are misspelled. Propose modifying function to this to account for the scenario: if (len(adata_labels)<len(annotation_labels)) | (set(annotation_labels) != set(adata_labels)):

Then the print will output the mismatched keys.

Also, cell type labels cannot include periods because it will throw the following torch error: KeyError: 'parameter name can\'t contain "."'

This should be checked as well.

wallet-maker commented 1 year ago

Good catch, the function now checks if the keys are identical and if not returns the mismatched keys. I also moved it to spectra_utils.py and it now automatically removes gene sets which do not fulfill the minimum length requirement.