prune2df warning messages

TobiTekath commented 4 years ago

Hi, thanks for developing this very useful toolkit.

I am wondering, if it is normal/expected to get so many warnings messages while performing prune2df. I get warning messages like

pyscenic.transform - WARNING - Less than 80% of the genes in some_gene could be mapped to hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather. Skipping this module.

or

pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for some_regulon could be mapped to hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather. Skipping this module.

I get these warnings with the two hg38-Database files as well as with the six hg19-Database files - so the hg-version of the db does not seem to be the cause. My data is annotated with gencode, so it should be hg38.

The results of the prun2df() do look quite good, I am just not sure about the ~29000 warning messages I get in the process.

My prune2df call looks like this df = prune2df(rnkdbs=dbs, modules=modules, motif_annotations_fname=MOTIF_ANNOTATIONS_FNAME, client_or_address="custom_multiprocessing", num_workers=30)

with dbs: [FeatherRankingDatabase(name="hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather"), FeatherRankingDatabase(name="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather")]

and

[FeatherRankingDatabase(name="hg19-500bp-upstream-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-500bp-upstream-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-10species.mc9nr.feather")]

respectively. I am using motifs-v9-nr.hgnc-m0.001-o0.0.tbl as motif annotation and hs_hgnc_curated_tfs.txt as tfs.

I am using pyscenic version 0.9.19.

Thanks in advance.

alyamahmoud commented 4 years ago

I am getting the same error message while using the python version of pySCENIC (not the command line version). Is there something that pops up particularly with hg38 ?

TobiTekath commented 4 years ago

Just as an quick update: I see the same warning messages when using the CLI-Version as well as in Jupyter.

@bramvds It would be great, if you could clarify if it is expected to have so many warnings. At least I see other people (#138) experiencing the same Warning messages.

cflerin commented 4 years ago

It's normal to have 10s or 100s of these warnings when running the pruning step (and it's a warning, not an error). The cause is just what it states: for a given module that is being pruned, there are not enough genes present that overlap with the database. The module is then excluded from further analysis.

prullens commented 3 years ago

Doesn't 80% seem a bit rigid? I'd be happy if ~50% of the module targets have motif enrichment. Or is 80% a justified threshold in your experience? Could one for example manually lower this percentage for modules to be included in further analysis? In my dataset an unfortunate number of interesting TFs are excluded consequent to this threshold.

Best,

klprint commented 2 years ago

I would like to chime in here and bump this issue. How can I lower the 80% cutoff? I end up with only 26 transcription factor activity matrix in the end of pyscenic due to this pruning step.

RinconFer commented 2 years ago

I would like to chime in here and bump this issue. How can I lower the 80% cutoff? I end up with only 26 transcription factor activity matrix in the end of pyscenic due to this pruning step.

I would also like to know this, in my dataset is pruning hundreds of regulons, some of them really interesting to me.

Thank you

klprint commented 2 years ago

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

RinconFer commented 2 years ago

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:
conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thank you very much!!

I'll test it as soon as possible and let you know how it goes.

klprint commented 2 years ago

FYI the parameter for chaning the cutoff is the following. Forgot to add it in my previous message.

pyscenic ctx \
    .... \
    --frac_mapping_module 0.8 \
    ....

Beki-seq commented 1 year ago

FYI the parameter for chaning the cutoff is the following. Forgot to add it in my previous message.
pyscenic ctx \
    .... \
    --frac_mapping_module 0.8 \
    ....

Hello, I meet exactly same issue with you guys. I also tried to use the command you give to solve the problem, however, my pyscenic said the cannot recognize frac_mapping_module 0.8. I am wondering is there any specific order for the --frac_mapping_module comman?

And my code is: pyscenic ctx \ adj.sample.tsv $feather \ --annotations_fname $tbl \ --frac_mapping_module 0.8 \ --expression_mtx_fname $input_loom \ --mode "dask_multiprocessing" \ --output reg.csv \ --num_workers 20 \ --mask_dropouts

li-xuyang28 commented 1 year ago

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:
conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

razorofockham commented 11 months ago

@li-xuyang28 I am finding the same error even when running with the default 0.8 cutoff, did you ever manage to get this to run? Cheers!

klgoss commented 10 months ago

Hello, I wanted to bump this as I'm experiencing the same issue as @li-xuyang28 . I've installed the pyscenic-test environment as @klprint described above, but am met with the following error: pyscenic: error: unrecognized arguments: --frac_mapping_module 0.5

Any insight is greatly appreciated!

LacquerHed commented 10 months ago

Having the same issue as @klgoss above, test environment does not contain the argument --frac_mapping_module

DiracZhu1998 commented 7 months ago

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did. You can test it in your setting using the following conda environment:
conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df
Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

Hey Xuyang, I was wondering have you solve this problems? Many thanks!

DiracZhu1998 commented 7 months ago

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did. You can test it in your setting using the following conda environment:
conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df
Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

I just solved it, you could check source code and pay attention to "annotated_features" variable which most likely will have duplicated motifIDs. I changed all gene to Ensembl ID based on its Gene name and Gene Synonym retrieved from Ensembl since some genes SCENIC used was synonym rather than gene name. but some genes like Atf5 has a symbol name called Atf7, causing two ENSMUSG00000038539 ~ cisbp__M0302 line in annotated_features variable, which further caused the purne steps "dimension" bug.

aertslab / pySCENIC

prune2df warning messages #106