NCBI-Hackathons / Scan2CNV

MIT License
1 stars 0 forks source link

test bedtools to get intersection of multiple CNV callers #24

Open ekarlins opened 7 years ago

ekarlins commented 7 years ago

bedtools is installed on our NCI cluster. Test using example bed files in directory where PennCNV is installed and bed file that Nick uploaded to our repo. Pay close attention to what is output in columns 4 and on after intersecting and if it makes sense.

Is there anything we would want other than intersection? If we end up with many callers (more than 2) do we just want intersection of all or would we want pairwise intersection?

mtbrown22 commented 7 years ago

It's unable to open the bed file in our repo. Am I using the right file? I opened it up and it doesn't look like it has the same format as the bed file in our PennCNV directory.

[brownmt2@cgemsIII intersect_output]$ bedtools intersect -a /DCEG/Resources/Tools/PennCNV/PennCNV-1.0.3/example/output_expected/ex1.bed -b /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/output/pennTest2_gsrcCNVcall.bed, ex1_pennTest2_gsrcCNVcall.bed Error: Unable to open file /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/output/pennTest2_gsrcCNVcall.bed,. Exiting.

@ekarlins

ekarlins commented 7 years ago

I put a couple other beds in files directory. You can try those.

Sent from my iPhone

On Mar 22, 2017, at 1:16 PM, mtbrown22 notifications@github.com wrote:

It's unable to open the bed file in our repo. Am I using the right file? I opened it up and it doesn't look like it has the same format as the bed file in our PennCNV directory.

[brownmt2@cgemsIII intersect_output]$ bedtools intersect -a /DCEG/Resources/Tools/PennCNV/PennCNV-1.0.3/example/output_expected/ex1.bed -b /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/output/pennTest2_gsrcCNVcall.bed, ex1_pennTest2_gsrcCNVcall.bed Error: Unable to open file /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/output/pennTest2_gsrcCNVcall.bed,. Exiting.

@ekarlins

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mtbrown22 commented 7 years ago

Still getting an error

[brownmt2@cgemsIII intersect_output]$ bedtools intersect -a /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/Test.gsrc.bed -b /DCEG/Resources/Tools/PennCNV/PennCNV-1.0.3/example/output_expected/ex1.bed, /DCEG/CGF/TempFileSwap/Maria/Hackathon/intersect_output/

Error: unable to open file or unable to determine types for file /DCEG/CGF/TempFileSwap/Maria/Hackathon/Global_Screening_Arrays/files/Test.gsrc.bed [brownmt2@cgemsIII intersect_output]$

@ekarlins

ekarlins commented 7 years ago

@mtbrown22, it looks like the issue with that file is that it's space separated not tab separated. It's possible you'll also have future issues comparing the two files with intersect b/c the PennCNV file has chromosome names that start with "chr" and the gsrc one does not. So they won't have any chromosome names in common. Also not sure if the two beds are sorted the same way. You might want to sort them both before intersect.

Anyway, a quick fix for the space to tab problem is this:

sed 's/ /\t/g' Test.gsrc.bed > Test2.gsrc.bed

The command is basically find and replace. sed 's/something/new/g' would replace all instances of "something" in a file with "new". The same command without the "g" (global) would only replace the first "something" on each line. In the bed file example the "something" is one space and the "new" is "\t", which is a tab.

See if you can use the same command to fix the issue with "chr" in one file but not the other. Hint "^" tells a bash command to look at the beginning of the line.

I'll change the R code to output a tab delimited bed file, so it should work in the future.

846843e3b27babde01410ce33b3c9af2e7e5e855