Closed rashindrie closed 3 years ago
@Rashindrie you are right, I forgot to explain how to format the file. The $crt
file needs to be in tab delimeted format and without the header. So you can use the following command:
tail -n+2 $crt | tr ',' '\t' | sponge $crt
And it should make the file compatible with the awk
command to generate the samples_xcl_list.txt
file.
Thanks @freeseek!
Another question, I believe the below line is trying to copy the CHROM,FROM,TO,JK
columns from $dup
file. However, my dup file does not contain any specific columns.
Command:
dup="$HOME/GRCh38/segdups.bed.gz"
bcftools annotate --no-version -Ou -a $dup -c CHROM,FROM,TO,JK -h /dev/stdin $dir/$pfx.unphased.bcf |
Contents in segdups.bed.gz
less $HOME/GRCh38/segdups.bed.gz
chr1 10000 10485 - 0.00711937
chr1 10485 18392 + 0.00579252
chr1 18392 87112 - 0.00457824
chr1 87113 88000 + 0.00686326
chr1 88000 110582 - 0.00653207
Because of this I am getting the below error:
Could not parse JK at chr1:632287 .. [-]
Command used to create segdups.bed.gz
.
wget -O- http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/genomicSuperDups.txt.gz | gzip -d |
awk '!($2=="chrX" && $8=="chrY" || $2=="chrY" && $8=="chrX") {print $2"\t"$3"\t"$4"\t"$30}' > genomicSuperDups.bed
awk '{print $1,$2; print $1,$3}' genomicSuperDups.bed | \
sort -k1,1 -k2,2n | uniq | \
awk 'chrom==$1 {print chrom"\t"pos"\t"$2} {chrom=$1; pos=$2}' | \
bedtools intersect -a genomicSuperDups.bed -b - | \
bedtools sort | \
bedtools groupby -c 4 -o min | \
awk 'BEGIN {i=0; s[0]="+"; s[1]="-"} {if ($4!=x) i=(i+1)%2; x=$4; print $0"\t0\t"s[i]}' | \
bedtools merge -s -c 4 -o distinct | \
grep -v "GL\|KI" | bgzip > $HOME/GRCh38/segdups.bed.gz && \
tabix -f -p bed $HOME/GRCh38/segdups.bed.gz
I am not sure why JK is not present in the file. Would you be able to provide some guidance.
Thank you
Make sure bedtools version is 2.27 or newer, or the segdups.bed.gz
file will be malformed.
Looks like I was using bedtools 2.26.0. I'll try updating.
Thank you!
Hi,
Im trying to use mocha on my genotype array data. After creating the minimal BCF, I tried the following command
This gave me the below error:
The format of the file I currently use for
$crt
I wasn't sure how the
$crt
file should look like, so I created a CSV in the following format. I was wondering if there is a set specification for this file?Thanks, Rashindrie