fuxialexander / marvel

Multigranular Analysis of Regulatory Variants on the Epigenomic Landscape
MIT License
8 stars 0 forks source link

Error executing process > 'collect_region_chunk" #3

Closed FTD2018 closed 3 years ago

FTD2018 commented 3 years ago

Hi, thanks for developing the MARVEL . I have tried to run it, but encounter an error: 图片 I have no ideas what's wrong And the input is 图片

Could you help me have a check? Thank you

fuxialexander commented 3 years ago

Can you add a breakpoint on line 32 of bin/collect_results.py or simply print out profiles[0] before line 32?

FTD2018 commented 3 years ago

Can you add a breakpoint on line 32 of bin/collect_results.py or simply print out profiles[0] before line 32?

I have added print(profiles[0]) before line32, and the results is: 图片

And it seems like the list of profiles is empty? So is it the problem with my input?

thank you

FTD2018 commented 3 years ago

The input is: 图片 I have modified the "test.config" in the conf folder, but I didn't change the "weights" parameter. The command is "nextflow main.nf -profile test -resume"

fuxialexander commented 3 years ago

In that case, can you try to add a print(i) after line 29 and check whether that file really exists?

FTD2018 commented 3 years ago

I have added the "print(i)" line, and the output is: 图片

FTD2018 commented 3 years ago

Hi, Alexander: I think the problem is the "path" problem. the command "collect_results.py -n human.sorted.merged.enhancer -r ./ -a /data01/xuxiaopeng/target_analysis/MARVEL/human.enhancer.bed -o ./"

In your collect_results.py line 27 "name_prefix = args.result_path + args.region_name", the name_prefix = "./human.sorted.merged.enhancer", so

for i in sorted(glob(name_prefix + "*_profiles.npz")): profiles.append(load_npz(i))

the profiles is empty.

The *_profiles.npz is in the path: /data01/xuxiaopeng/target_analysis/MARVEL/results/test/_chunks/_profiles.npz

Looking forward to your reply

thanks

fuxialexander commented 3 years ago

Sorry that I forgot to check github notification recently. Theoretically when starting a nextflow process the files should be linked to the work dir, so the path "./" should be fine. However this path /data01/xuxiaopeng/target_analysis/MARVEL/results/test/_chunks/ looks a bit different from what I'd expect. It should be either "enhancer_chunk" or "promoter_chunk".

Can you add 'scan_input.view()' to line 625 and report the printed info?

FTD2018 commented 3 years ago

Thanks for your responses.

Actually, in order to avoid the path or variable name problems, I have replaced the contents in the test_input folder with my own data but keep the same name. So the MARVEL worked.

But I have some questions about the results. My data is the target sequencing of several GWAS loci.

1: In the summary folder, the csv file gives the motifs, but how can I simply know which motif is affected by which variant. 2: How can I provide other motifs (such as JASPAR) to the MARVEL instead of HOCOMOCO? 3: For the threshold, can I set a loose FDR value to be considered as significant because of the much less enhancers in the target regions than the whole genome. In other words, is the fdr value correlated with both p-val and enhancer numbers or just p-val, because the p-val: 8.96E-6, FDR:0.125 in my result, and p-val: 9.77E-6, FDR: 0.059 in your published paper. 4: It is a little weird about the pdf file, looks like this (somehow, I can't upload the picture.......):

Do you have any suggestions? Thanks

fuxialexander commented 3 years ago

1: In the summary folder, the csv file gives the motifs, but how can I simply know which motif is affected by which variant.

I did not include that function. What I did before is to rescan the sequence with FIMO or MOODS and overlap the motif site with the variants.

2: How can I provide other motifs (such as JASPAR) to the MARVEL instead of HOCOMOCO?

You can search for HOCOMOCO in the bin/get_hocomoco_motif.sh file and replace it with the desired url

3: For the threshold, can I set a loose FDR value to be considered as significant because of the much less enhancers in the target regions than the whole genome. In other words, is the fdr value correlated with both p-val and enhancer numbers or just p-val, because the p-val: 8.96E-6, FDR:0.125 in my result, and p-val: 9.77E-6, FDR: 0.059 in your published paper.

I think FDR is related to both p-value ranking and total test number. Using a looser threshold may involves further argument with journal reviewers, you might consider try other multiple hypothesis testing approaches like BH, BY, etc. In the worst case you can focus on the top ones and do some experiment validation.

4: It is a little weird about the pdf file, looks like this (somehow, I can't upload the picture.......):

Maybe you can send it to my email? fuxialexander [at] gmail [dot] com

Do you have any suggestions?

Thanks