Closed WhaleGe closed 2 years ago
Hi @WhaleGe
This is something that was not considered, so thank you for posting this bug report.
Indeed, how the wrapper function is written now it won't work without networking, this will be changed in the next release!
However, you can still run pycistarget but it will need some more manual work. The biomart host is only needed to balance the amount of promoter regions in foreground and background sets for running DEM and for removing promoter regions.
For now I would suggest to run pycistarget without the wrapper funciton, you can basically follow this tutorial for that: https://pycistarget.readthedocs.io/en/latest/pycistarget_scenic%2B_human_brain.html
The pseudocode to do something like this would look like:
menr = {}
for key in region_sets.keys():
regions = region_sets[key]
menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
region_sets = regions,
specie = species,
auc_threshold = ctx_auc_threshold,
nes_threshold = ctx_nes_threshold,
rank_threshold = ctx_rank_threshold,
annotation = annotation,
motif_similarity_fdr = motif_similarity_fdr,
path_to_motif_annotations = path_to_motif_annotations,
n_cpu = n_cpu,
_temp_dir= _temp_dir,
annotation_version = annotation_version)
menr['DEM_'+key+'_All'] = DEM(dem_db = dem_db,
region_sets = regions,
log2fc_thr = dem_log2fc_thr,
motif_hit_thr = dem_motif_hit_thr,
max_bg_regions = dem_max_bg_regions,
specie = species,
promoter_space = promoter_space,
motif_annotation = annotation,
motif_similarity_fdr = motif_similarity_fdr,
path_to_motif_annotations = path_to_motif_annotations,
n_cpu = n_cpu,
annotation_version = annotation_version,
tmp_dir = save_path,
_temp_dir= _temp_dir)
This will run the analysis without taking into account promoters.
If you still would want to take into account promoters you would have to download the genome annotation manually (from somewhere where you do have network access). And follow the template of this function: https://github.com/aertslab/scenicplus/blob/main/src/scenicplus/wrappers/run_pycistarget.py with you annotation.
Sorry for the inconvenience at the moment. Like I said, we will change this in the next release.
Best,
Seppe
Hello, I ran according to the pseudo code you gave, but it reported an error again, I failed to use the source code to eliminate this bug
from pycistarget.motif_enrichment_cistarget import *
from pycistarget.motif_enrichment_dem import *
ctx_db_path = '/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
dem_db_path='/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'
regions = region_sets[key]
ctx_db = cisTargetDatabase(ctx_db_path, regions)
dem_db = DEMDatabase(dem_db_path, regions)
menr = {}
for key in region_sets.keys():
regions = region_sets[key]
menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
region_sets = regions,
specie = 'homo_sapiens',
auc_threshold = 0.005,
nes_threshold = 3.0,
rank_threshold = 0.05,
annotation = ['Direct_annot', 'Orthology_annot'],
motif_similarity_fdr = 0.000001,
path_to_motif_annotations = motif_annotation,
n_cpu = 1,
_temp_dir= tmp_dir,
annotation_version = 'v10nr_clust')
menr['DEM_'+key+'_All'] = DEM(dem_db = dem_db,
region_sets = regions,
log2fc_thr = 0.5,
motif_hit_thr = 3.0,
max_bg_regions = 500,
specie = 'homo_sapiens',
promoter_space = 500,
motif_annotation = ['Direct_annot', 'Orthology_annot'],
motif_similarity_fdr = 0.000001,
path_to_motif_annotations = motif_annotation,
n_cpu = 1,
annotation_version = 'v10nr_clust',
tmp_dir = "/ScenicPlus_tutorial/pbmc_tutorial/tmp",
_temp_dir= tmp_dir)
The error is as follows:
Traceback (most recent call last): File "
", line 3, in File "/jdfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/mn3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 543, in run_cistarget ctx_dict = [ctx_internal(ctx_db = ctx_db, File "/jdfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/mn3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 543, in ctx_dict = [ctx_internal(ctx_db = ctx_db, File "/jdfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/mn3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 707, in ctx_internal ctx_result.run_ctx(ctx_db) File "/jdfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/mn3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 299, in run_ctx self.regions_to_db = ctx_db.regions_to_db[self.name] if type(ctx_db.regions_to_db) == dict else ctx_db.regions_to_db.loc[set(coord_to_region_names(self.region_set)) & set(ctx_db.regions_to_db['Target'])] KeyError: 'Topic1'
Could you give me some suggestions? Thank you very much
Hi @WhaleGe!
Can you show the keys of regions
? I think in this case is a format issue.
C
Is this?
for key in region_sets.keys(): ... print(f'{key}: {region_sets[key].keys()}') ... topics_otsu: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16']) topics_top_3: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16']) DARs: dict_keys(['B_cells_1', 'B_cells_2', 'CD14+_Monocytes', 'CD4_T_cells', 'CD8_T_cells', 'Dendritic_cells', 'FCGR3A+_Monocytes', 'NK_cells'])
I meant here:
regions = region_sets[key]
ctx_db = cisTargetDatabase(ctx_db_path, regions)
dem_db = DEMDatabase(dem_db_path, regions)
How does regions look like here? What key are you using?
I would update the code to this:
from pycistarget.motif_enrichment_cistarget import *
from pycistarget.motif_enrichment_dem import *
ctx_db_path = '/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
dem_db_path='/1.database/ScenicPlus/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'
menr = {}
for key in region_sets.keys():
regions = region_sets[key]
ctx_db = cisTargetDatabase(ctx_db_path, regions)
dem_db = DEMDatabase(dem_db_path, regions)
menr['CTX_'+key+'_All'] = run_cistarget(ctx_db = ctx_db,
region_sets = regions,
specie = 'homo_sapiens',
auc_threshold = 0.005,
nes_threshold = 3.0,
rank_threshold = 0.05,
annotation = ['Direct_annot', 'Orthology_annot'],
motif_similarity_fdr = 0.000001,
path_to_motif_annotations = motif_annotation,
n_cpu = 1,
_temp_dir= tmp_dir,
annotation_version = 'v10nr_clust')
menr['DEM_'+key+'_All'] = DEM(dem_db = dem_db,
region_sets = regions,
log2fc_thr = 0.5,
motif_hit_thr = 3.0,
max_bg_regions = 500,
specie = 'homo_sapiens',
promoter_space = 500,
motif_annotation = ['Direct_annot', 'Orthology_annot'],
motif_similarity_fdr = 0.000001,
path_to_motif_annotations = motif_annotation,
n_cpu = 1,
annotation_version = 'v10nr_clust',
tmp_dir = "/ScenicPlus_tutorial/pbmc_tutorial/tmp",
_temp_dir= tmp_dir)
Thanks a lot, no more mistakes here.
Sorry, I found that running run_scenicplus also requires networking. Does this function have to be connected to the Internet? Can it be bypassed? Also, when will the next version be released?
Hi @WhaleGe !
Im not sure what you are referring to. You can pass extra parameters from the function (**kwargs) to ray.init(): https://docs.ray.io/en/latest/ray-core/package-ref.html . I think setting address='local' could solve your issue? You can check further details in ray documentation (maybe open an issue there), this is what we use for parallelization.
hi @WhaleGe
I think @cbravo93 misunderstood you're question.
Indeed the run_scenicplus
function requires networking by default, this to calculate the search space.
You can circumvent this however like this:
The gene annotation should look like this:
In [1]: pr_annot
Out [2]:
+--------------+-----------+-----------+--------------+------------+----------------------------+-------------------+
| Chromosome | Start | End | Strand | Gene | Transcription_Start_Site | Transcript_type |
| (category) | (int32) | (int32) | (category) | (object) | (int64) | (object) |
|--------------+-----------+-----------+--------------+------------+----------------------------+-------------------|
| chr1 | 1471765 | 1497848 | + | ATAD3B | 1471765 | protein_coding |
| chr1 | 1471765 | 1497848 | + | ATAD3B | 1471784 | protein_coding |
| chr1 | 3069168 | 3438621 | + | PRDM16 | 3069168 | protein_coding |
| chr1 | 3069168 | 3438621 | + | PRDM16 | 3069197 | protein_coding |
| ... | ... | ... | ... | ... | ... | ... |
| chrY | 6865918 | 6911752 | - | AMELY | 6872608 | protein_coding |
| chrY | 21903618 | 21918042 | - | RBMY1E | 21918032 | protein_coding |
| chrY | 24045229 | 24048019 | - | CDY1B | 24047969 | protein_coding |
| chrY | 24045229 | 24048019 | - | CDY1B | 24048019 | protein_coding |
+--------------+-----------+-----------+--------------+------------+----------------------------+-------------------+
Stranded PyRanges object has 85,990 rows and 7 columns from 330 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
Containing only information about protein coding genes. With Start end End representing resp. the start and the end of the gene. You can download this information locally via biomart http://www.ensembl.org/info/data/biomart/index.html.
The chromosome sizes should look like this:
In [3]: pr_chromsizes
Out [4]:
+------------------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int32) | (int32) |
|------------------------+-----------+-----------|
| chr1 | 0 | 248956422 |
| chr1_GL383518v1_alt | 0 | 182439 |
| chr1_GL383519v1_alt | 0 | 110268 |
| chr1_GL383520v2_alt | 0 | 366580 |
| ... | ... | ... |
| chrX_KI270881v1_alt | 0 | 144206 |
| chrX_KI270913v1_alt | 0 | 274009 |
| chrY | 0 | 57227415 |
| chrY_KI270740v1_random | 0 | 37240 |
+------------------------+-----------+-----------+
Unstranded PyRanges object has 455 rows and 3 columns from 455 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
With Start always equal to 0 and End equal to the length of the chromosome. You can get this information from the fasta index file from your genome (.fa.fai file, see http://www.htslib.org/doc/faidx.html).
from scenicplus.enhancer_to_gene import get_search_space
get_search_space(
scplus_obj,
species = None,
assembly = None,
pr_annot = pr_annot,
pr_chromsizes = pr_chromsizes,
upstream = [1000, 150000],
downstream = [1000, 150000])
scplus_obj.uns['search_space']
the calculation will be skipped in the wrapper function and no networking should be required anymore.
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['GEX_celltype'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = 'pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = True,
calculate_DEGs_DARs = True,
export_to_loom_file = True,
export_to_UCSC_file = True,
path_bedToBigBed = 'pbmc_tutorial',
n_cpu = 12,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)
Hope this helps,
Feel free to reopen the issue if not.
Best,
Seppe
Thanks a lot. I bypassed this step of needing network and followed yours. But again it produces new error. I read it's (#30) but I didn't get it. I am using the PBMC data from the first tutorial.
Error output
Traceback (most recent call last):
File "<stdin>", line 22, in <module>
File "<stdin>", line 2, in <module>
File "/hwfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/scenicplus/src/scenicplus/wrappers/run_scenicplus.py", line 128, in run_scenicplus
merge_cistromes(scplus_obj)
File "/hwfssz1/ST_SUPERCELLS/P21Z10200N0134/USER/huangwaidong/2.software/scenicplus/src/scenicplus/cistromes.py", line 86, in merge_cistromes
raise ValueError("No cistromes found! Make sure that the motif enrichment results look good!")
ValueError: No cistromes found! Make sure that the motif enrichment results look good!
Since I'm running in the command line, I can't see the enriched results
Could you offer some advice?
Hi
You can generate a html file like this, which you can open in any webbrowser:
with open('output.html', 'w') as f:
f.write(menr['DEM_topics_otsu_All']).DEM_results('Topic8').data)
Could you please post further questions in the discussion tab instead of commenting to this issue?
Thanks,
Best
Seppe
Hello, I found that running run_pycistarget requires networking. How can I run it locally, because the HPC cannot connect to external websites.
运行报错 raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url