cancerit / ascatNgs

Somatic copy number analysis using WGS paired end wholegenome sequencing
http://cancerit.github.io/ascatNgs/
GNU Affero General Public License v3.0
68 stars 17 forks source link

1000 genome SNP panel generation #50

Closed mjz1 closed 8 years ago

mjz1 commented 8 years ago

The code snippet intended to create the 1000g SNP panel:

$ export TG_DATA=ftp://ftp.ensembl.org/pub/grch37/release-83/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz $ curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\ perl -ane '\ next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); \ next if($F[0] eq $l_c && $F[1]-1000 < $l_p); \ $F[7]=~m/MAF=([^;]+)/; next if($1 < 0.05); \ printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; \ $l_c=$F[0]; $l_p=$F[1]; \ ' > SnpPositions_GRCh37_1000g.tsv

gives the error:

Can't modify single ref constructor in scalar assignment at -e line 6, near "];" syntax error at -e line 7, at EOF Execution of -e aborted due to compilation errors.

keiranmraine commented 8 years ago

Thanks for reporting this. For some reason removing one of the escaped line feeds fixes this:

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\
 perl -ane '\
 next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); \
 next if($F[0] eq $l_c && $F[1]-1000 < $l_p); \
 $F[7]=~m/MAF=([^;]+)/; next if($1 < 0.05); \
 printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; $l_c=$F[0]; $l_p=$F[1];' \
 > SnpPositions_GRCh37_1000g.tsv

I've updated the docs. Thanks,

mjz1 commented 8 years ago

Now this command is giving me an empty output file.

MimoriK commented 3 years ago

I have the same question, an empty output file.

keiranmraine commented 3 years ago

Removing the line feeds from the perl appears to resolve this, I don't know why they are having an impact:

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' | \
perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); next if($F[0] eq $l_c && $F[1]-1000 < $l_p); $F[7]=~m/MAF=([^;]+)/; next if($1 < 0.05); printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; $l_c=$F[0]; $l_p=$F[1];'