XiaoTaoWang / EagleC

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps
Other
51 stars 8 forks source link

inquiries on output files #13

Open distilledchild opened 1 year ago

distilledchild commented 1 year ago

Hi,

First of all, thank you for the great tool. I am wondering there is a way to convert the output files to vcf format because I want to compare other SV callers' with eagleC's. What would be the best option for it. I was thinking about SURVIVOR after converting them to vcf.

2nd, when I plot with the command, plot-intraSVs, Traceback (most recent call last): File "/eaglecnglatest/bin/plot-intraSVs", line 75, in run() File "/eaglecnglatest/bin/plot-intraSVs", line 57, in run chrom, interval = args.region.split(':') AttributeError: 'NoneType' object has no attribute 'split'

and, plot-interSVs, Traceback (most recent call last): File "/eaglecnglatest/bin/plot-interSVs", line 62, in run() File "/eaglecnglatest/bin/plot-interSVs", line 54, in run vis = interChrom(args.cool_uri, args.chroms, correct=correct) File "eaglec/visualize.pyx", line 212, in eaglec.visualize.interChrom.init File "/eaglecnglatest/lib/python3.8/site-packages/cooler/api.py", line 75, in init self.filename = store.file.filename AttributeError: 'NoneType' object has no attribute 'file' Could you give me suggestions?

3rd, I used Raw without normalization, would it be fine? next, even though only 4 types are well detected by the tool, can I get information of insertion??

Also, when I compare SVs from eagleC with those from others, would it be fine to compare with all SVs combined from all resolutions, 1K, 5K, 10K and 50K to those from other tools??

And, after the running eagleC, I found that multiple files and folders. What is the difference between highres.txt and .txt? Also, there was a combined.txt instead of highres.txt in 5K resolution. and multiple filders..! (Files in folders look like intermediate files for the final results) Could you give me some information for it please?

XiaoTaoWang commented 1 year ago
  1. About the format conversion. I think instead of converting EagleC's outputs to vcf, it would be easier to write a script to extract the breakpoint information from vcf.
  2. what commands did you use? I would suggest you double check your command to make sure you inputted all the necessary information.
  3. For Hi-C data, I think the raw matrix should be fine, but as I mentioned in the documentation, the prediction accuracy is usually lower than ICE/CNV-normalized matrices. For HiChIP/ChIA-PET, ICE-normalized matrices must be used instead of the raw matrices
  4. Some of the predicted translocations should be insertions, but it's hard to distinguish insertions from translocations based on local contact patterns.
  5. It depends on the sequencing depth of your data. If your library was deeply sequenced, I think adding results from the 1Kb resolution should be fine.
  6. I think you can simply ignore/delete other files and folders, once you get the final results (those files suffixed with ".5K_combined.txt" )
distilledchild commented 1 year ago

@XiaoTaoWang Thank you so much for your help and advice!