adoebley / Griffin

A flexible framework for nucleosome profiling of cell-free DNA
Other
24 stars 16 forks source link

Script for the nucleosome peak amplitude calculation? #4

Closed hyunhwan-jeong closed 2 years ago

hyunhwan-jeong commented 2 years ago

Hello @adoebley,

I recently read the Griffin manuscript, and there is a section of Griffin: Nucleosome profile feature quantification, and I can find that you calculated the amplitude of the nucleosome peaks surrounding the site using FFT, but I was not able to find any relevant script in the GitHub repository. If it is in your GitHub, could you guide me to where it is located? Otherwise, I would appreciate if you can share any code snippet on it.

Kind regards,

Hyun-Hwan Jeong

adoebley commented 2 years ago

Hi Hyun-Hwan,

The code is in griffin_analyses (https://github.com/adoebley/Griffin_analyses) but it might be difficult to find in there so here is a code snippet (data is a pandas dataframe of coverage profiles with positions in the columns):

fft_columns = np.arange(-960,960,15)
fft_res = np.fft.fft(data[fft_columns])
amplitude = np.abs(fft_res[:,10])

Best, Anna-Lisa

hyunhwan-jeong commented 2 years ago

Thanks for sharing it, and I want to confirm that the data contains coverage profile generated from griffin_nucleosome_profiling pipeline (e.g., https://github.com/adoebley/Griffin/blob/main/demo/griffin_nucleosome_profiling_demo_files/expected_results/Healthy_demo.all_sites.coverage.txt).

I want to note that I also searched the repo, but I didn't find any relevant code for it. I do see that you used FFT for the ML feature generation, but the code you shared wasn't there. I guess you might miss uploading it? I double-checked using both grep and GitHub code search (https://github.com/adoebley/Griffin_analyses/search?q=fft), nothing was found.

Anyway, that was helpful and I appreciate your answer.

Best,

Hyun-Hwan Jeong

hyunhwan-jeong commented 2 years ago

If my assumption was correct, the code results the following error:

KeyError: "None of [Int64Index([-960, -945, -930, -915, -900, -885, -870, -855, -840, -825,\n            ...\n             810,  825,  840,  855,  870,  885,  900,  915,  930,  945],\n           dtype='int64', length=128)] are in the [columns]"

And I believe the first line has to be np.arrange(-960,960,15).astype("str").

Thanks,

Hyun-Hwan Jeong