Error in coverage_plots_chromosomes

KolmogorovLab / Wakhan

Haplotype-specific somatic copy number aberrations/profiling from long reads sequencing data

MIT License

27 stars 0 forks source link

Error in coverage_plots_chromosomes #6

Closed xwBio closed 3 weeks ago

xwBio commented 8 months ago

Dear,

I encountered an error when running this software. Could you please help me resolve it? Thank you in advance.

" DEBUG:root:chr1: Ignoring centromere at 499 of 4980 bins (size 1) DEBUG:root:chr1: Ignoring centromere at 499 of 4980 bins (size 1) Traceback (most recent call last): File "/Wakhan-main/src/main.py", line 268, in main() File "/Wakhan-main/src/main.py", line 227, in main coverage_plots_chromosomes(csv_df_coverage, csv_df_phasesets, arguments, thread_pool) File "/Wakhan-main/src/plots.py", line 294, in coverage_plots_chromosomes add_histo_clusters_plot(df_cnr_hp1.log2.values.tolist(), df_cnr_hp2.log2.values.tolist(), states, centers, stdev, arguments, chrom, html_graphs) File "/storage/yangjianLab/sunxiwei/software/Wakhan-main/src/plots.py", line 1003, in add_histo_clusters_plot fig.add_trace(go.Scatter(x=xn, y=yn, mode='markers', marker=dict(color=cdict[g]), name=ldict[g], opacity=0.7, ), KeyError: 10" "

Best regards, Xiwei

tahashmi commented 8 months ago

Sure, I will debug it, but you can just comment this line and rerun it, it will not effect any functionality. It's just for an extra debug.

tahashmi commented 8 months ago

Hi, are you using the latest repo , its already disabled there. Please use the main latest repo. Thanks!

xwBio commented 8 months ago

Hi, are you using the latest repo , its already disabled there. Please use the main latest repo. Thanks!

Dear,

After running the software with the latest version, the previous issue has been resolved.

Thank you very much.

Best regards, Xiwei

xwBio commented 8 months ago

Dear,

However, upon using the updated version, I've come across another error below.

" DEBUG:root:chr22: Ignoring centromere at 103 of 1017 bins (size 1) INFO:root:Generating coverage/copy numbers plots genome wide Traceback (most recent call last): File "/storage/yangjianLab/sunxiwei/software/Wakhan-main/src/main.py", line 295, in main() File "/storage/yangjianLab/sunxiwei/software/Wakhan-main/src/main.py", line 283, in main integer_fractional_means = integer_fractional_cluster_means(arguments, df_segs_hp1, df_segs_hp2, centers) File "/storage/yangjianLab/sunxiwei/software/Wakhan-main/src/utils.py", line 499, in integer_fractional_cluster_means integer_centers, fractional_centers = detect_first_copy_integers_fractional_cluster_means(arguments, df_segs_hp1, df_segs_hp2, centers) File "/storage/yangjianLab/sunxiwei/software/Wakhan-main/src/utils.py", line 450, in detect_first_copy_integers_fractional_cluster_means first_copy = centers[sumsup.index(sorted(sumsup, reverse=True)[1])] IndexError: list index out of range

Best regards, Xiwei

tahashmi commented 8 months ago

Can you help me in debug? I think centers value is NULL or something else is wrong.

Copy coverage.csv, coverage_ps.csv and <Your genome name>_SNPs.csv files from data/ output dir (it should be in src) of your current run to some separate directory i.e., /home/abc/dry_run_data.
I have just update dry_run command, so git clone the latest code somewhere else
Print some data by adding following in main.py after line:
```
print(df_segs_hp1)
print(df_segs_hp2)
print(centers)
```
Then run again the code with adding this command additional to what you use already: --dryrun True --dryrun-path <This is the path where you copied CSVs files in step-1, ie. like, /home/abc/dry_run_data/>
Please show me the log then.

Dry run will be quick, it will use existing coverage/pileup data.

Thanks!

xwBio commented 7 months ago

Dear,

Please find the debug output. Wakhan_debug_out.txt

Best regards, Xiwei

tahashmi commented 7 months ago

Thanks. Looks like somehow correct means are not being provided to HMM. Are you using --phaseblock-flipping-enable True ? Then enabling --phaseblocks-enable True will help to see the phaseblocks.

Can you try running again this dryrun with inserting following means/stdev above this line:

means = [0, 5.5, 13, 22, 28, 32]
stdev = [1, 5, 5, 5, 5, 5]

After this, if possible looking at <genome_name>_genome_copynumber.html will help how the coverage looks like.

xwBio commented 7 months ago

It worked. Thank you.

Additionally, as I cannot find a manual, could you kindly provide some guidance on how this software works?