Get chromosome sizes (for hg38 here)

import pyranges as pr import requests import pandas as pd target_url='http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes' chromsizes=pd.read_csv(target_url, sep='\t', header=None) chromsizes.columns=['Chromosome', 'End'] chromsizes['Start']=[0]*chromsizes.shape[0] chromsizes=chromsizes.loc[:,['Chromosome', 'Start', 'End']]

Exceptionally in this case, to agree with CellRangerARC annotations

chromsizes['Chromosome'] = [chromsizes['Chromosome'][x].replace('v', '.') for x in range(len(chromsizes['Chromosome']))] chromsizes['Chromosome'] = [chromsizes['Chromosome'][x].split('')[1] if len(chromsizes['Chromosome'][x].split('')) > 1 else chromsizes['Chromosome'][x] for x in range(len(chromsizes['Chromosome']))] chromsizes=pr.PyRanges(chromsizes)

I did it for several samples before and it was working well. However, now it seems like the link is not working anymore and i got the following error:

HTTPError Traceback (most recent call last) /Users/stur/Teaseq/Scenicplus/BM4/Scenicplus_BM4_Part_1.ipynb Cell 42 line 6 4 import pandas as pd 5 target_url='http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes' ----> 6 chromsizes=pd.read_csv(target_url, sep='\t', header=None) 7 chromsizes.columns=['Chromosome', 'End'] 8 chromsizes['Start']=[0]*chromsizes.shape[0]

File /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.._deprecate_kwarg..wrapper(*args, *kwargs) 209 else: 210 kwargs[new_arg_name] = new_arg_value --> 211 return func(args, **kwargs)

File /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, *kwargs) 311 if len(args) > num_allow_args: 312 warnings.warn( 313 msg.format(arguments=arguments), 314 FutureWarning, 315 stacklevel=find_stack_level(inspect.currentframe()), 316 ) --> 317 return func(args, **kwargs)

File /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 935 kwds_defaults = _refine_defaults_read( ... File /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs) 642 def http_error_default(self, req, fp, code, msg, hdrs): --> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 403: Forbidden

I tried to copye the URL on a web browser and it is not working.

What should I do?

Best,

Steven

wgao688 commented 7 months ago

For some reason UCSC changed access privileges. You need to use this link instead now: https://hgdownload-test.gi.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes

SteveTur commented 7 months ago

Thank you! I have another problem in the next following step:

from scenicplus.wrappers.run_scenicplus import run_scenicplus try: run_scenicplus( scplus_obj = scplus_obj, variable = ['GEX_celltype'], species = 'hsapiens', assembly = 'hg38', tf_file = '/Users/stur/utoronto_human_tfs_v_1.01.txt', save_path = os.path.join(work_dir, 'scenicplus'), biomart_host = biomart_host, upstream = [1000, 150000], downstream = [1000, 150000], calculate_TF_eGRN_correlation = True, calculate_DEGs_DARs = True, export_to_loom_file = True, export_to_UCSC_file = True, path_bedToBigBed = '/Users/stur/BM4/', n_cpu = 12, _temp_dir = '/tmp/ray_spill') except Exception as e:

in case of failure, still save the object

dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)

One of my other samples didn't encounter any issues with that step.

Here is the error:

I found similar errors on GitHub, but no solutions.

Thank you again for your help!

Best,

Steven

SeppeDeWinter commented 7 months ago

Hi @SteveTur

How does your motif enrichment look like? Are your motifs correctly annotated to TFs? Can you share an example motif enrichment output html file?

All the best,

Seppe

ghuls commented 7 months ago

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes seems accessible again.

aertslab / scenicplus

Chromsizes URL #296

Get chromosome sizes (for hg38 here)

Exceptionally in this case, to agree with CellRangerARC annotations

in case of failure, still save the object