Closed Kdreval closed 3 years ago
Hi Kdreval,
Thank you for trying our tool. We greatly appreciate your interest.
First, I noticed from your command line that you are missing one dash when specifying the amp threshold (-t-amp 2.99) which should be (--t-amp 2.99). Can you re-run the command with this correction and let us know please?.
Thank you for pointing out the issue with "combine_roi_files.r". I will look at it and fix it.
Regards,
-Abdallah
Thank you Abdallah for answering so quickly! I have noticed that too and tried running with the correct flag, it did not solve the error. This is log from the version that points to the combine_roi_files.r
script path in the annotate_peaks.sh
:
perl SV-Hotspot/src/sv-hotspot.pl -o OUTPUT -g hg38 -C chrX --sv SV-Hotspot/test_data/sv.bedpe -e SV-Hotspot/test_data/exp.tsv -c SV-Hotspot/test_data/cna.tsv -r SV-Hotspot/test_data/enhancers.bed --chip-cov SV-Hotspot/test_data/H3K27ac.bg --chip-cov-lbl H3K27ACETYL -w 100000 -s 30000 -d 50000 --t-amp 2.99 --t-del 1.35 --stat-test wilcox.test --pval 0.05 --plot-top-peaks 10 --left-ext 100000 --right-ext 100000
Name "Getopt::Long::CallBack::OVERLOAD" used only once: possible typo at /usr/share/perl5/overload.pm line 34.
########################################################
###### SV-HotSpot v1.0.0 ######
########################################################
SVs file : SV-Hotspot/test_data/sv.bedpe
Genome name : hg38
Sliding window size : 100000
Sliding window step : 30000
Output directory : OUTPUT/sv-hotspot-output
Annotation file : SV-Hotspot/annotations/hg38/genes.bed
peakPick window size : 100
peakPick minimum SD : 5
% of samples cutoff : 10
Expression file : SV-Hotspot/test_data/exp.tsv
Copy number file : SV-Hotspot/test_data/cna.tsv
Amplification cutoff : 2.99
Deletion cutoff : 1.35
Genes of interest : 0
Region of interest : SV-Hotspot/test_data/enhancers.bed
Chromosomes to analyze: chrX
SV types to analyze : ALL
Peak merge distance : 50000
Number of nearby genes: 1
ChIP-Seq cov. file : SV-Hotspot/test_data/H3K27ac.bg
Statistical test : wilcox.test
Plot top peaks : 10
ChIP-seq coverage : H3K27ACETYL
Left extension size : 100000
Right extension size : 100000
% of samples to merge : 5
# of stop-merge peaks : 0
########################################################
--------------------------------------------------
Input Verification Step
--------------------------------------------------
Checking structural variants file format...PASS.
Checking annotation file format...PASS.
Checking region of interest file(s) format...PASS.
Checking ChIP-Seq coverage file format...PASS.
Checking copy number file format...PASS.
Checking if the expression file has no duplicated rows...PASS.
Checking if feature name in the expression matches the one in the annotation file...PASS.
--------------------------------------------------
STEP 1: Identifying Peaks (hotspot regions)
--------------------------------------------------
Segmenting the genome into sliding windows
Chromosomes to segment:
[1] "chrX"
chrX
Overlapping breakpoints with sliding windows
Splitting the overlapped file by chromosome
Summarizing sample counts for chromosome chrX
Reading window data...
5202 unique windows were found
Summarizing...
Counting samples for "ALL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INV" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "BND" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DUP" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DEL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INS" structural variants... 5202 windows reported (including windows w/o count).
Call structural variant peaks (hotspots)
Reading sliding window sample count...
Chromosomes to analyze:
[1] "chrX"
chrX
Number of peaks: 153
Warning messages:
1: Removed 59 rows containing missing values (geom_bar).
2: Removed 1 rows containing missing values (geom_text).
--------------------------------------------------
STEP 2: Annotating Peaks
--------------------------------------------------
Summarizing annotated peaks...Done.
------------------------------------------------------------------------------------
STEP 3: Determining the association between SVs and the expression of nearby genes
-------------------------------------------------------------------------------------
CMD to run: Rscript SV-Hotspot/src/determine_gene_association_v2.r OUTPUT/sv-hotspot-output SV-Hotspot/test_data/exp.tsv SV-Hotspot/test_data/cna.tsv 2.99 1.35 0.05 wilcox.test
Reading all breakpoints...done.
Reading peak information...done.
Reading expression data...done.
Reading copy number data and overlap with genes...done.
Preparing data for statistical test...NULL
done.
Running statistical test...done.
--------------------------------------------------
STEP 4: Visualizing hotspot regions
--------------------------------------------------
Reading all breakpoints...done.
Reading sliding window sample count...done.
Reading copy number data for genes and peaks...Error: Peaks copy number file "OUTPUT/sv-hotspot-output/processed_data/peaks_with_cn.bed" was not found!.
Execution halted
It is my understanding that the missing file is created in the determine_gene_association_v2.r
? I can confirm it is not being generated. Here would be the directory structure of the OUTPUT
folder:
OUTPUT/
└── sv-hotspot-output
├── annotated_peaks_summary.tsv
├── genes.associated.with.SVs.tsv
├── peaks
│ ├── all_peaks.bed
│ ├── chrX.peak.bed
│ ├── chrX.peak.group.bed
│ ├── chrX.png
│ └── ucsc_custom_track_files
└── processed_data
├── all_bp.bed
├── annotated_peaks_summary.tsv
├── counts.rds
├── del_dup_sv.bed
├── fisher-results.tsv
├── genes.bed
├── genes_with_cn.bed
├── peaks_overlap_bp.tsv
├── peaks_overlap_with_region_of_interest.tsv
└── peaks_with_overlap_nearby_genes.tsv
4 directories, 16 files
Please let me know if there is more information I can provide to help with this issue and thanks again for working on it!
Thank you for pointing to this bug. I did fix the bug, can you try and let me know please?. Please pull the scripts again to get the updated script.
I can confirm I run it with the demo data. The only thing is that it seems to be requiring data.table
package which you might want to add to the list of dependencies.
Please feel free to close the issue as resolved. Thank you for the amazing support! I appreciate your time and help with this issue!
I am glad that you were able to run the demo and thank you for pointing out the missing of data.table package. Looks like we forgot to include it when we updated the instruction page as it was there in the old version.
Thank you again for using our tool and please feel free to contact us for any additional support you may need in the future.
Regards,
-Abdallah
Hi Ha and Abdallah! Very interesting tool, thank you for creating it! Looking forward to using SV-Hotspot, but identified some errors while running test data that comes with the tool. After cloning the repo, I created the directory OUTPUT and used the command from README to run it on test data. Here is the complete log:
It seems to be missing the path to
combine_roi_files.r
script in theannotate_peaks.sh
. Adding the full path tocombine_roi_files.r
seems to only mask the issue, as error now appears at step 4:Would appreciate any suggestion on how to make it work.