Closed SteveTur closed 9 months ago
Hi @SteveTur
It seems like your error occurs when downloading the genome annotation from biomart. Does the computer you are running this analysis on have access to the internet?
Also you can try to provide a pre-downloaded annotation file using the custom_annot
parameter.
The file should look like this:
Chromosome Start Strand Gene Transcript_type
8053 chrY 22490397 1 PRY protein_coding
8153 chrY 12662368 1 USP9Y protein_coding
8155 chrY 12701231 1 USP9Y protein_coding
8158 chrY 12847045 1 USP9Y protein_coding
8328 chrY 22096007 -1 PRY2 protein_coding
... ... ... ... ... ...
246958 chr1 181483738 1 CACNA1E protein_coding
246960 chr1 181732466 1 CACNA1E protein_coding
246962 chr1 181776101 1 CACNA1E protein_coding
246963 chr1 181793668 1 CACNA1E protein_coding
246965 chr1 203305519 1 BTG2 protein_coding
[78812 rows x 5 columns]
I hope this helps.
All the best,
Seppe
Hi Seppe,
Yes i think it was a network problem. however, I went through another error later which is the error discuss before on the github:
_AttributeError Traceback (most recent call last) /Users/stur/Teaseq/Scenicplus/PB2/Scenicplus_PB2_Part_2.ipynb Cell 31 line 2 20 except Exception as e: 21 #in case of failure, still save the object 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1) ---> 23 raise(e)
/Users/stur/Teaseq/Scenicplus/PB2/Scenicplus_PB2_Part_2.ipynb Cell 31 line 3 1 from scenicplus.wrappers.run_scenicplus import run_scenicplus 2 try: ----> 3 run_scenicplus( 4 scplus_obj = scplus_obj, 5 variable = ['GEX_celltype'], 6 species = 'hsapiens', 7 assembly = 'hg38', 8 tf_file = '/Users/stur/Teaseq/Teaseq_Data_TAL1/PB2/utoronto_human_tfs_v_1.01.txt', 9 save_path = os.path.join(work_dir, 'scenicplus'), 10 biomart_host = biomart_host, 11 upstream = [1000, 150000], 12 downstream = [1000, 150000], 13 calculate_TF_eGRN_correlation = True, 14 calculate_DEGs_DARs = True, 15 export_to_loom_file = True, 16 export_to_UCSC_file = True, ... --> 174 ), columns=cv.get_feature_names(), index=regulons.keys()) 175 regulon_mat = regulon_mat.reindex(columns=feature_names, fill_value=0).T 176 if keep_direct_and_extended_if_not_direct is True:
AttributeError: 'CountVectorizer' object has no attribute 'get_featurenames'
You mentioned an update to the source code. Is this still the case? I am also unsure how to convert the object to open the ScenicPlus analysis on a genome browser.
Thank you for your help.
Best,
Steven
Hi Steven
What version of scikit-learn are you using?
All the best,
Seppe
Hi Seppe,
Here are my version:
scikit-image 0.20.0 scikit-learn 1.2.2 scikit-misc 0.1.4
Best,
Steven
Hi @SteveTur
This should be fixed by this commit: https://github.com/aertslab/scenicplus/commit/e5ba6fcf42459b6e6b70e27359ddd11289d70cc5
All the best,
Seppe
Hi Seppe,
I managed to troubleshoot it. However I am confuse about which tutorial follow for my own data. I have complete the 10X_PBMC one and got good results but i would like to go deeper in the analysis and get for example the integrated multiome plot from the cerebrelum tutorial. I am not sure to understand what is exactly the genes.gtf file and where do you provide it or do i have to created it myself?
Best,
Steven
Hi @SteveTur
These gtf files are publicly available for almost any species, see for example https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/ for human.
I'm closing this issue.
If you have further questions about downstream analysis feel free to open a new discussion and I will be happy to help.
Good luck with you analysis.
All the best,
Seppe
Hi it looks like:
from scenicplus.wrappers.run_pycistarget import run_pycistarget
doesn't work anymore. Here the error I have:
What should I do?
Best,
Steven