ChrisMaherLab / SV-Hotspot

8 stars 6 forks source link

Running demo data #2

Closed Kdreval closed 3 years ago

Kdreval commented 3 years ago

Hi Ha and Abdallah! Very interesting tool, thank you for creating it! Looking forward to using SV-Hotspot, but identified some errors while running test data that comes with the tool. After cloning the repo, I created the directory OUTPUT and used the command from README to run it on test data. Here is the complete log:

perl SV-Hotspot/src/sv-hotspot.pl -o OUTPUT -g hg38 -C chrX --sv SV-Hotspot/test_data/sv.bedpe -e SV-Hotspot/test_data/exp.tsv -c SV-Hotspot/test_data/cna.tsv -r SV-Hotspot/test_data/enhancers.bed --chip-cov SV-Hotspot/test_data/H3K27ac.bg --chip-cov-lbl H3K27ACETYL -w 100000 -s 30000 -d 50000 -t-amp 2.99 --t-del 1.35 --stat-test wilcox.test --pval 0.05 --plot-top-peaks 10 --left-ext 100000 --right-ext 100000
Name "Getopt::Long::CallBack::OVERLOAD" used only once: possible typo at /usr/share/perl5/overload.pm line 34.
Value "-amp" invalid for option t (number expected)
Unknown option: amp
########################################################
######             SV-HotSpot v1.0.0              ######
########################################################
 SVs file              : SV-Hotspot/test_data/sv.bedpe
 Genome name           : hg38
 Sliding window size   : 100000
 Sliding window step   : 30000
 Output directory      : OUTPUT/sv-hotspot-output
 Annotation file       : SV-Hotspot/annotations/hg38/genes.bed
 peakPick window size  : 100
 peakPick minimum SD   : 5
 % of samples cutoff   : 10
 Expression file       : SV-Hotspot/test_data/exp.tsv
 Copy number file      : SV-Hotspot/test_data/cna.tsv
 Amplification cutoff  : 2.8
 Deletion cutoff       : 1.35
 Genes of interest     : 0
 Region of interest    : SV-Hotspot/test_data/enhancers.bed
 Chromosomes to analyze: chrX
 SV types to analyze   : ALL
 Peak merge distance   : 50000
 Number of nearby genes: 1
 ChIP-Seq cov. file    : SV-Hotspot/test_data/H3K27ac.bg
 Statistical test      : wilcox.test
 Plot top peaks        : 10
 ChIP-seq coverage     : H3K27ACETYL
 Left extension size   : 100000
 Right extension size  : 100000
 % of samples to merge : 5
 # of stop-merge peaks : 0
########################################################

--------------------------------------------------
Input Verification Step
--------------------------------------------------
Checking structural variants file format...PASS.
Checking annotation file format...PASS.
Checking region of interest file(s) format...PASS.
Checking ChIP-Seq coverage file format...PASS.
Checking copy number file format...PASS.
Checking if the expression file has no duplicated rows...PASS.
Checking if feature name in the expression matches the one in the annotation file...PASS.

--------------------------------------------------
STEP 1: Identifying Peaks (hotspot regions) 
--------------------------------------------------
Segmenting the genome into sliding windows
Chromosomes to segment:
[1] "chrX"
chrX 
Overlapping breakpoints with sliding windows
Splitting the overlapped file by chromosome
Summarizing sample counts for chromosome chrX 
Reading window data...
5202 unique windows were found
Summarizing...
Counting samples for "ALL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INV" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "BND" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DUP" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DEL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INS" structural variants... 5202 windows reported (including windows w/o count).

Call structural variant peaks (hotspots)
Reading sliding window sample count...
Chromosomes to analyze:
[1] "chrX"
chrX 
Number of peaks:  153 
Warning messages:
1: Removed 59 rows containing missing values (geom_bar). 
2: Removed 1 rows containing missing values (geom_text). 

--------------------------------------------------
STEP 2: Annotating Peaks 
--------------------------------------------------
SV-Hotspot/src/annotate_peaks.sh: line 81: combine_roi_files.r: command not found
Summarizing annotated peaks...Error in length(roi.name.cols) : object 'roi.name.cols' not found
Execution halted

------------------------------------------------------------------------------------
STEP 3: Determining the association between SVs and the expression of nearby genes
-------------------------------------------------------------------------------------
CMD to run: Rscript SV-Hotspot/src/determine_gene_association_v2.r OUTPUT/sv-hotspot-output SV-Hotspot/test_data/exp.tsv SV-Hotspot/test_data/cna.tsv 2.8 1.35 0.05 wilcox.test 
Reading all breakpoints...done.
Reading peak information...done.
Reading expression data...done.
Reading copy number data and overlap with genes...done.
Preparing data for statistical test...NULL
done.
Running statistical test...done.
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'OUTPUT/sv-hotspot-output/processed_data/annotated_peaks_summary.tsv': No such file or directory
Execution halted

--------------------------------------------------
STEP 4: Visualizing hotspot regions 
--------------------------------------------------
cut: OUTPUT/sv-hotspot-output/annotated_peaks_summary.tsv: No such file or directory
Warning message:
NAs introduced by coercion 
Warning message:
NAs introduced by coercion 
Reading all breakpoints...Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning messages:
1: In file(file, "rt") :
  'raw = FALSE' but 'OUTPUT/sv-hotspot-output' is not a regular file
2: In file(file, "rt") :
  cannot open file 'OUTPUT/sv-hotspot-output': it is a directory
Execution halted

It seems to be missing the path to combine_roi_files.r script in the annotate_peaks.sh. Adding the full path to combine_roi_files.r seems to only mask the issue, as error now appears at step 4:

perl SV-Hotspot/src/sv-hotspot.pl -o OUTPUT -g hg38 -C chrX --sv SV-Hotspot/test_data/sv.bedpe -e SV-Hotspot/test_data/exp.tsv -c SV-Hotspot/test_data/cna.tsv -r SV-Hotspot/test_data/enhancers.bed --chip-cov SV-Hotspot/test_data/H3K27ac.bg --chip-cov-lbl H3K27ACETYL -w 100000 -s 30000 -d 50000 -t-amp 2.99 --t-del 1.35 --stat-test wilcox.test --pval 0.05 --plot-top-peaks 10 --left-ext 100000 --right-ext 100000
Name "Getopt::Long::CallBack::OVERLOAD" used only once: possible typo at /usr/share/perl5/overload.pm line 34.
Value "-amp" invalid for option t (number expected)
Unknown option: amp
########################################################
######             SV-HotSpot v1.0.0              ######
########################################################
 SVs file              : SV-Hotspot/test_data/sv.bedpe
 Genome name           : hg38
 Sliding window size   : 100000
 Sliding window step   : 30000
 Output directory      : OUTPUT/sv-hotspot-output
 Annotation file       : SV-Hotspot/annotations/hg38/genes.bed
 peakPick window size  : 100
 peakPick minimum SD   : 5
 % of samples cutoff   : 10
 Expression file       : SV-Hotspot/test_data/exp.tsv
 Copy number file      : SV-Hotspot/test_data/cna.tsv
 Amplification cutoff  : 2.8
 Deletion cutoff       : 1.35
 Genes of interest     : 0
 Region of interest    : SV-Hotspot/test_data/enhancers.bed
 Chromosomes to analyze: chrX
 SV types to analyze   : ALL
 Peak merge distance   : 50000
 Number of nearby genes: 1
 ChIP-Seq cov. file    : SV-Hotspot/test_data/H3K27ac.bg
 Statistical test      : wilcox.test
 Plot top peaks        : 10
 ChIP-seq coverage     : H3K27ACETYL
 Left extension size   : 100000
 Right extension size  : 100000
 % of samples to merge : 5
 # of stop-merge peaks : 0
########################################################

--------------------------------------------------
Input Verification Step
--------------------------------------------------
Checking structural variants file format...PASS.
Checking annotation file format...PASS.
Checking region of interest file(s) format...PASS.
Checking ChIP-Seq coverage file format...PASS.
Checking copy number file format...PASS.
Checking if the expression file has no duplicated rows...PASS.
Checking if feature name in the expression matches the one in the annotation file...PASS.

--------------------------------------------------
STEP 1: Identifying Peaks (hotspot regions) 
--------------------------------------------------
Segmenting the genome into sliding windows
Chromosomes to segment:
[1] "chrX"
chrX 
Overlapping breakpoints with sliding windows
Splitting the overlapped file by chromosome
Summarizing sample counts for chromosome chrX 
Reading window data...
5202 unique windows were found
Summarizing...
Counting samples for "ALL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INV" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "BND" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DUP" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DEL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INS" structural variants... 5202 windows reported (including windows w/o count).

Call structural variant peaks (hotspots)
Reading sliding window sample count...
Chromosomes to analyze:
[1] "chrX"
chrX 
Number of peaks:  153 
Warning messages:
1: Removed 59 rows containing missing values (geom_bar). 
2: Removed 1 rows containing missing values (geom_text). 

--------------------------------------------------
STEP 2: Annotating Peaks 
--------------------------------------------------
Summarizing annotated peaks...Done.

------------------------------------------------------------------------------------
STEP 3: Determining the association between SVs and the expression of nearby genes
-------------------------------------------------------------------------------------
CMD to run: Rscript SV-Hotspot/src/determine_gene_association_v2.r OUTPUT/sv-hotspot-output SV-Hotspot/test_data/exp.tsv SV-Hotspot/test_data/cna.tsv 2.8 1.35 0.05 wilcox.test 
Reading all breakpoints...done.
Reading peak information...done.
Reading expression data...done.
Reading copy number data and overlap with genes...done.
Preparing data for statistical test...NULL
done.
Running statistical test...done.

--------------------------------------------------
STEP 4: Visualizing hotspot regions 
--------------------------------------------------
Reading all breakpoints...done.
Reading sliding window sample count...done.
Reading copy number data for genes and peaks...Error: Peaks copy number file "OUTPUT/sv-hotspot-output/processed_data/peaks_with_cn.bed" was not found!.
Execution halted

Would appreciate any suggestion on how to make it work.

Eteleeb commented 3 years ago

Hi Kdreval,

Thank you for trying our tool. We greatly appreciate your interest.

First, I noticed from your command line that you are missing one dash when specifying the amp threshold (-t-amp 2.99) which should be (--t-amp 2.99). Can you re-run the command with this correction and let us know please?.

Thank you for pointing out the issue with "combine_roi_files.r". I will look at it and fix it.

Regards,

-Abdallah

Kdreval commented 3 years ago

Thank you Abdallah for answering so quickly! I have noticed that too and tried running with the correct flag, it did not solve the error. This is log from the version that points to the combine_roi_files.r script path in the annotate_peaks.sh:

perl SV-Hotspot/src/sv-hotspot.pl -o OUTPUT -g hg38 -C chrX --sv SV-Hotspot/test_data/sv.bedpe -e SV-Hotspot/test_data/exp.tsv -c SV-Hotspot/test_data/cna.tsv -r SV-Hotspot/test_data/enhancers.bed --chip-cov SV-Hotspot/test_data/H3K27ac.bg --chip-cov-lbl H3K27ACETYL -w 100000 -s 30000 -d 50000 --t-amp 2.99 --t-del 1.35 --stat-test wilcox.test --pval 0.05 --plot-top-peaks 10 --left-ext 100000 --right-ext 100000
Name "Getopt::Long::CallBack::OVERLOAD" used only once: possible typo at /usr/share/perl5/overload.pm line 34.
########################################################
######             SV-HotSpot v1.0.0              ######
########################################################
 SVs file              : SV-Hotspot/test_data/sv.bedpe
 Genome name           : hg38
 Sliding window size   : 100000
 Sliding window step   : 30000
 Output directory      : OUTPUT/sv-hotspot-output
 Annotation file       : SV-Hotspot/annotations/hg38/genes.bed
 peakPick window size  : 100
 peakPick minimum SD   : 5
 % of samples cutoff   : 10
 Expression file       : SV-Hotspot/test_data/exp.tsv
 Copy number file      : SV-Hotspot/test_data/cna.tsv
 Amplification cutoff  : 2.99
 Deletion cutoff       : 1.35
 Genes of interest     : 0
 Region of interest    : SV-Hotspot/test_data/enhancers.bed
 Chromosomes to analyze: chrX
 SV types to analyze   : ALL
 Peak merge distance   : 50000
 Number of nearby genes: 1
 ChIP-Seq cov. file    : SV-Hotspot/test_data/H3K27ac.bg
 Statistical test      : wilcox.test
 Plot top peaks        : 10
 ChIP-seq coverage     : H3K27ACETYL
 Left extension size   : 100000
 Right extension size  : 100000
 % of samples to merge : 5
 # of stop-merge peaks : 0
########################################################

--------------------------------------------------
Input Verification Step
--------------------------------------------------
Checking structural variants file format...PASS.
Checking annotation file format...PASS.
Checking region of interest file(s) format...PASS.
Checking ChIP-Seq coverage file format...PASS.
Checking copy number file format...PASS.
Checking if the expression file has no duplicated rows...PASS.
Checking if feature name in the expression matches the one in the annotation file...PASS.

--------------------------------------------------
STEP 1: Identifying Peaks (hotspot regions) 
--------------------------------------------------
Segmenting the genome into sliding windows
Chromosomes to segment:
[1] "chrX"
chrX 
Overlapping breakpoints with sliding windows
Splitting the overlapped file by chromosome
Summarizing sample counts for chromosome chrX 
Reading window data...
5202 unique windows were found
Summarizing...
Counting samples for "ALL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INV" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "BND" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DUP" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "DEL" structural variants... 5202 windows reported (including windows w/o count).
Counting samples for "INS" structural variants... 5202 windows reported (including windows w/o count).

Call structural variant peaks (hotspots)
Reading sliding window sample count...
Chromosomes to analyze:
[1] "chrX"
chrX 
Number of peaks:  153 
Warning messages:
1: Removed 59 rows containing missing values (geom_bar). 
2: Removed 1 rows containing missing values (geom_text). 

--------------------------------------------------
STEP 2: Annotating Peaks 
--------------------------------------------------
Summarizing annotated peaks...Done.

------------------------------------------------------------------------------------
STEP 3: Determining the association between SVs and the expression of nearby genes
-------------------------------------------------------------------------------------
CMD to run: Rscript SV-Hotspot/src/determine_gene_association_v2.r OUTPUT/sv-hotspot-output SV-Hotspot/test_data/exp.tsv SV-Hotspot/test_data/cna.tsv 2.99 1.35 0.05 wilcox.test 
Reading all breakpoints...done.
Reading peak information...done.
Reading expression data...done.
Reading copy number data and overlap with genes...done.
Preparing data for statistical test...NULL
done.
Running statistical test...done.

--------------------------------------------------
STEP 4: Visualizing hotspot regions 
--------------------------------------------------
Reading all breakpoints...done.
Reading sliding window sample count...done.
Reading copy number data for genes and peaks...Error: Peaks copy number file "OUTPUT/sv-hotspot-output/processed_data/peaks_with_cn.bed" was not found!.
Execution halted

It is my understanding that the missing file is created in the determine_gene_association_v2.r? I can confirm it is not being generated. Here would be the directory structure of the OUTPUT folder:

OUTPUT/
└── sv-hotspot-output
    ├── annotated_peaks_summary.tsv
    ├── genes.associated.with.SVs.tsv
    ├── peaks
    │   ├── all_peaks.bed
    │   ├── chrX.peak.bed
    │   ├── chrX.peak.group.bed
    │   ├── chrX.png
    │   └── ucsc_custom_track_files
    └── processed_data
        ├── all_bp.bed
        ├── annotated_peaks_summary.tsv
        ├── counts.rds
        ├── del_dup_sv.bed
        ├── fisher-results.tsv
        ├── genes.bed
        ├── genes_with_cn.bed
        ├── peaks_overlap_bp.tsv
        ├── peaks_overlap_with_region_of_interest.tsv
        └── peaks_with_overlap_nearby_genes.tsv

4 directories, 16 files

Please let me know if there is more information I can provide to help with this issue and thanks again for working on it!

Eteleeb commented 3 years ago

Thank you for pointing to this bug. I did fix the bug, can you try and let me know please?. Please pull the scripts again to get the updated script.

Kdreval commented 3 years ago

I can confirm I run it with the demo data. The only thing is that it seems to be requiring data.table package which you might want to add to the list of dependencies. Please feel free to close the issue as resolved. Thank you for the amazing support! I appreciate your time and help with this issue!

Eteleeb commented 3 years ago

I am glad that you were able to run the demo and thank you for pointing out the missing of data.table package. Looks like we forgot to include it when we updated the instruction page as it was there in the old version.

Thank you again for using our tool and please feel free to contact us for any additional support you may need in the future.

Regards,

-Abdallah