aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Wrong feather file? #33

Closed AIBio closed 1 year ago

AIBio commented 1 year ago

Hi,

I want to create ranked feather files for pig genome (Sus_scrofa). So I read the all steps recorded in https://github.com/aertslab/create_cisTarget_databases/issues/4. Here is my code:

# Step 1
create_cistarget_databases_dir=/home/yhw/software/create_cisTarget_databases
fasta_filename=/home/yhw/refgenome/cistarget/Sus_scrofa.Sscrofa11.1.107_CellRanger_gene_pro10k.fa
motifs_dir=/home/yhw/refgenome/jaspar/v2022/motifs_cb_format
motifs_list_filename=/home/yhw/refgenome/jaspar/v2022/motifs.lst
db_prefix=Sus11__refseq-r80__10kb_up_and_down_tss
nbr_threads=20
nbr_total_parts=10
mkdir partial
# Each invocation of the for loop (with different ${current_part}) can also be submitted to a different node to speedup the motif scoring.
for current_part in $(seq 1 ${nbr_total_parts}) ; do
    "${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
         -f "${fasta_filename}" \
         -M "${motifs_dir}" \
         -m "${motifs_list_filename}" \
         -p "${current_part}" "${nbr_total_parts}" \
         -o "partial/${db_prefix}" \
         -t "${nbr_threads}" \
         -g "\|ENSSSCT[0-9.]+$"
done
# Step 2
"${create_cistarget_databases_dir}/combine_partial_motifs_or_tracks_vs_regions_or_genes_scores_cistarget_dbs.py" \
    -i partial/ -o .
rm -rf partial
# Step 3
db_filename=${db_prefix}.motifs_vs_genes.scores.feather
"${create_cistarget_databases_dir}/convert_motifs_or_tracks_vs_regions_or_genes_scores_to_rankings_cistarget_dbs.py" \
    -i "${db_filename}"

The sequence name in fasta file is ">FAM120B|ENSSSCT00000055371".

I use the feather file generated above to run SCENIC. (Note: I can use the feather files your group provided to obtain the final results)

But, I met a problem. Here is the error log:

  File "/home/yhw/software/anaconda3/envs/sc.p37/lib/python3.7/site-packages/pyscenic/transform.py", line 503, in df2regulons
    assert not df.empty, "Signatures dataframe is empty!"
AssertionError: Signatures dataframe is empty!

Could you help me to solve this problem? or Can you provide the human TFs motif files in "cb_format" that your group used to generate the feathers files in cistarget database?

I'm looking forward to your reply!

Best wishes~ Hanwen

ghuls commented 1 year ago

Does the list of gene names you provided match the ones in the database (e.g. `FAM120B)?

The motifs will be made publicly available upon publication of the SCENIC+ paper.

AIBio commented 1 year ago

Thank you for your reply.

But, I found that the transcription factor IDs in the cb format motif file and the tbl format annotation file are different, which will cause the above problem.

I tried to modify the motif filename and motif IDs in cb format. Then, the problem was solved.

However, due to the limited transcription factor library that I manually collected, many transcription factors cannot be calculated to score.

Finally, when will SCENIC+ papers be published? We would love to have all the cb format motifs included in your tbl files available soon.

ghuls commented 1 year ago

Our SCENIC+ public motif collection is now available: https://resources.aertslab.org/cistarget/motif_collections/

krayon4river commented 7 months ago

微信截图_20231206100214

i follow the steps above and generate the results but without the output 'genes_vs_motifs.rankings.feather',if i should change the fasta input from biomart mentioned by https://github.com/aertslab/create_cisTarget_databases/issues/16 148767876-8e476c07-3375-4061-886f-aa2234ff5556