aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

Issue when running pycistarget for custom species zebrafish #137

Closed JoGraesslin closed 1 year ago

JoGraesslin commented 1 year ago

Hello, I am trying to set up scenic plus for zebrafish. I used the Ensembl genome annotation for alignment and followed the workflow until pycistarget:

from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
    region_sets = region_sets,
    species='custom',
    custom_annot=custom_annotation,
    save_path = os.path.join(work_dir, 'motifs'),
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    run_without_promoters = True,
    n_cpu = 8,
    _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
    annotation_version="2020"
    )

It runs without an Error message, but the output is missing annotation:

image

I have created a .tbl file from JASPAR using orthology data bases:

image

And a custom_annotation file based on the .gtf file looking like this

image

Do you have an idea what went wrong? I have used the Ensembl annotation all the way, and also tried with UCSC chromosomes. However, in all species, my Direct_annot and Orthology_annot columns are NaN.

Any help would be appreciated, since I don't know anymore what else to try. Thanks!

Originally posted by @JoGraesslin in https://github.com/aertslab/scenicplus/issues/56#issuecomment-1491748738

SidG13 commented 1 year ago

Just a thought, try reformatting the .tbl file so that the column names match mine. Maybe the column order/names have something to do with it.

image

Also have you checked other topics for annotations? Sometimes some are NaN but others have annotated TFs.

JoGraesslin commented 1 year ago

Thanks for your reply @SidG13! I have tried it out and unfortunately I am running into the same problem. I have checked all different topics and they are all missing annotation.

JoGraesslin commented 1 year ago

Must have been an Issue with my annotation or .feather. I have created a .tbl that uses the SCENIC+ public motif collection and renamed the human gene names from the .tbl to zebrafish gene names (using ensmbl biomart, oma and alliance orthology databases) and now it works :)

fengweimin-maker commented 1 year ago

Must have been an Issue with my annotation or .feather. I have created a .tbl that uses the SCENIC+ public motif collection and renamed the human gene names from the .tbl to zebrafish gene names (using ensmbl biomart, oma and alliance orthology databases) and now it works :) Hi@JoGraesslin,

I have built .feather and .tbl for species of Axolotl, but it failed, can you give some example of making .feather and .tbl for other species. If you help me, I will be very grateful.

Best, Weimin

JoGraesslin commented 1 year ago

Must have been an Issue with my annotation or .feather. I have created a .tbl that uses the SCENIC+ public motif collection and renamed the human gene names from the .tbl to zebrafish gene names (using ensmbl biomart, oma and alliance orthology databases) and now it works :) Hi@JoGraesslin,

I have built .feather and .tbl for species of Axolotl, but it failed, can you give some example of making .feather and .tbl for other species. If you help me, I will be very grateful.

Best, Weimin

Hi @fengweimin-maker I downloaded the motif collections from here https://resources.aertslab.org/cistarget/motif_collections/. Then I renamed all gene names in the human.tbl file from the /snapshots folder with zebrafish orthologous gene names. These I found in different databases, (ensembl, oma, alliance) which contain orthologous genes based on amino acid similarity between human and zebrafish proteins. I dont know where you can find comparable databases like this for the axolotl, you might have to do some research in your community.

For the .feather, you use the consensus_regions.fa and the motif db as input:

consensus="/scenicplus/scATAC/consensus_peak_calling/consensus_regions.fa"
motifs_cb="/v10nr_clust_public/singletons/
motifs_list="/snapshots/motifs.lst"
out="outdir"
ncpu="20"

create_cistarget_motif_databases.py \
    -f $consensus \
    -M $motifs_cb \
    -m $motifs_list \
    -o $out \
    -t $ncpu \
    -g '#?$'

Hope this helps! Best, Jo

mtrebelo commented 1 year ago

Hi @JoGraesslin,

I am also using a zebrafish dataset with SCENIC+. I am exactly at this point of the analysis - I have the feather files for rankings_db and scores_db; I have created a custom_annotation based on .gtf but I am having issues with the .tbl file. I have done as you mention but the file must have some issue because it has not been able to run. Would you be able to make the tbl file available? Thanks a lot in advance

JoGraesslin commented 1 year ago

@mtrebelo check this out https://github.com/JoGraesslin/Zebrafish_SCENIC

fengweimin-maker commented 1 year ago

Must have been an Issue with my annotation or .feather. I have created a .tbl that uses the SCENIC+ public motif collection and renamed the human gene names from the .tbl to zebrafish gene names (using ensmbl biomart, oma and alliance orthology databases) and now it works :) Hi@JoGraesslin,

I have built .feather and .tbl for species of Axolotl, but it failed, can you give some example of making .feather and .tbl for other species. If you help me, I will be very grateful. Best, Weimin

Hi @fengweimin-maker I downloaded the motif collections from here https://resources.aertslab.org/cistarget/motif_collections/. Then I renamed all gene names in the human.tbl file from the /snapshots folder with zebrafish orthologous gene names. These I found in different databases, (ensembl, oma, alliance) which contain orthologous genes based on amino acid similarity between human and zebrafish proteins. I dont know where you can find comparable databases like this for the axolotl, you might have to do some research in your community.

For the .feather, you use the consensus_regions.fa and the motif db as input:

consensus="/scenicplus/scATAC/consensus_peak_calling/consensus_regions.fa"
motifs_cb="/v10nr_clust_public/singletons/
motifs_list="/snapshots/motifs.lst"
out="outdir"
ncpu="20"

create_cistarget_motif_databases.py \
    -f $consensus \
    -M $motifs_cb \
    -m $motifs_list \
    -o $out \
    -t $ncpu \
    -g '#?$'

Hope this helps! Best, Jo

Thank you very much!

Best wish, Weimin

mtrebelo commented 1 year ago

Hi @JoGraesslin,

Thanks a lot! I have created it without problems.

However, when running pycistarget, I am getting this:

2023-06-08 09:34:58,354 pycisTarget_wrapper INFO     Loading cisTarget database for topics_otsu
2023-06-08 09:34:58,355 cisTarget    INFO     Reading cisTarget database

and then this error:

AttributeError: ('PyRanges object has no attribute', 'Overlap')

Did this by any chance occur with you?

Best, Maria