aertslab / CREsted

Other
27 stars 1 forks source link

Mapping TF analysis with scRNA-seq #75

Closed kenxie7 closed 11 hours ago

kenxie7 commented 1 day ago

Report

Hi!

I was following the tutorial under "Enhancer Code Analysis" and for the last parts (matching with scRNA-seq), I had some troubles. Generating the html_paths with the contribution_dir, it seemed to return a string directory ending with motifs.html which is not existent under the directory, and I got the following error at the bottom.

classes = list(adata.obs_names)
contribution_dir = "modisco_results4"
html_paths = crested.tl.modisco.generate_html_paths(
    all_patterns, classes, contribution_dir
)

pattern_match_dict = crested.tl.modisco.find_pattern_matches(
    all_patterns, html_paths, q_val_thr=0.1
)  # q_val threshold to only select significant matches
print(pattern_match_dict)

Secondly, I couldn't find the motif_collection.tsv and wasn't sure where we could find this? I have downloaded the v10_nrclust_public but it only had the logos folder.

motif_to_tf_df = crested.tl.modisco.read_motif_to_tf_file(
    "/data/projects/c04/cbd-saerts/nkemp/tools/Motif_collection.tsv"
)
motif_to_tf_df

Many thanks for the great package!

ERROR:

 AttributeError                            Traceback (most recent call last)
Cell In[132], line 1
----> 1 pattern_match_dict = crested.tl.modisco.find_pattern_matches(
      2     all_patterns, html_paths, q_val_thr=0.1
      3 )  # q_val threshold to only select significant matches
      4 print(pattern_match_dict)

File ~/CREsted/src/crested/tl/modisco/_tfmodisco.py:956, in find_pattern_matches(all_patterns, html_paths, q_val_thr)
    946 pattern_id_parts = pattern_id_whole.split("_")
    947 pattern_id = (
    948     pattern_id_parts[-3]
    949     + "_"
   (...)
    954     + pattern_id_parts[-1]
    955 )
--> 956 matching_row = df_motif_database.loc[df_motif_database["pattern"] == pattern_id]
    957 matching_rows.append(matching_row)
    958 pattern_ids.append(pattern_id_whole)

AttributeError: 'str' object has no attribute 'loc

Version information

No response

nkempynck commented 1 day ago

Hi Ken

Concerning your first issue, the whole 'matching with scRNA-seq' part requires that tfmodisco was run with a motif database to which the patterns get matched with tomtom. You can do this by following the example in the notebook. This will then also generate the motifs.html file in the output folder. We made a motif collection available, as well as the motif to TF mapping tsv file (concerning your second issue). You can download those through crested.get_motif_db() https://github.com/aertslab/CREsted/blob/main/src/crested/_datasets.py

Best Niklas

kenxie7 commented 11 hours ago

Hi Niklas,

Many thanks for pointing the way! My issue with get_motif_db() was that the SHA256 did not match even after reinstallation of the github repo, however I managed to find the URL and downloaded the DB manually and proceeded with the tutorial. A minor issue was that tomtom could not generate the html in jupyter-lab (notebooks), but in command line it worked.

ValueError: SHA256 hash of downloaded file (motif_db.meme) does not match the known hash: expected sha256:31d3fa1117e752b0d3076a73b278b59bb4a056d744401e9d5861310d03186cfd but got 1667eaf9ca2abb37fa21b541faa9e1676690b58a1206078255a9d7e389731dbc. Deleted download for safety. The downloaded file may have been corrupted or the known hash may be outdated.

Many thanks again for your help!

Best, Ken

LukasMahieu commented 10 hours ago

Hey Ken, The file must have been updated without updating the SHA. I'll fix this and add some unit tests so this won't happen again. We'll look into the html/cmd line issue as well. Thanks for letting us know!