Anderson-Lab / SNV-DA

MIT License
2 stars 1 forks source link

problem with running snvDA.R #2

Open Hecate08 opened 8 years ago

Hecate08 commented 8 years ago

Hello,

I try to run SNV-DA on my 40 samples, while the first 20 samples are in group 1 and the last 20 samples are in group 2. I created the variant files manually in the format requested. I used this command to run snvDA.R: Rscript snvDA.R -M SNVM.unfiltered.mod.csv -A sensitive_vs_resistant \ -D 20 -U sensitive -H resistant -G exonic -Z 3 \ -O -J -W 10 -V 10 -I 15 -T 5 -L 10 -B 1000 -E 40

But I got this error message:

Warning message: replacing previous import ‘igraph::%>%’ by ‘rgl::%>%’ when loading ‘mixOmics’ Error in row.names<-.data.frame(*tmp*, value = value) : duplicate 'row.names' are not allowed Calls: rownames<- -> row.names<- -> row.names<-.data.frame In addition: Warning message: non-unique value when setting 'row.names': ‘0’ Execution halted

How can I fix it?

Thank you, Hecate

biomrpaul commented 8 years ago

Hi Hecate,

Your error is a common one when analyzing data in R. In most datasets, the labeled row names need to have unique identifiers. In your case, the error is saying that there is more than one row with the label "0". This suggests to me that your SNVM input file is not formatted correctly.

Below are the first few lines from an example SNVM:

SNV,annot,DF0,DF1,DF10,DF2,DF3,DF4,DF5,R0,R1,R2,R3,R4, COL6A2_chr21:47552130_A->G,syn_exonic,0,0,0,0.368,0.627,0,0,0,0,0.600 STEAP3_chr2:119986965_C->G,intronic,Na,Na,Na,Na,Na,Na,Na,Na,0.400,Na

The first line has two column names for the SNV ID and its annotation, followed by the samples names. For each subsequent line, there needs to be a unique ID for the variant, its annotation, and then the values.

Let me know if this is not the case and we can troubleshoot further.

Thank you, Matt

Hecate08 commented 8 years ago

Hello Matt, Thanks for your quick reply. I looked at the SNVM input. It happend that some SNVs had as the first letter a quotation mark:

"PLVAP,BST2_chr19:17504521_C->T",intergenic,Na,0,0.705314,Na,Na,Na,Na,Na,0,0,1,Na,0,0,1,Na PLVAP_chr19:17462499_G->A,UTR3,1,1,0.998233,0,0.527228,1,0.991803,1,1,1,1,Na,1,1,1,1,1,1,1

I think the cause for this is the comma seperator between different genes. In the annotation file in csv format the fields are seperated with a comma too. So I changed the output of the annotation to txt and substituted the comma with a semicolon. Afterwards the SNVM looks like this:

PLVAP;BST2_chr19:17504521_C->T,intergenic,Na,0,0.705314,Na,Na,Na,Na,Na,0,0,1,Na,0,0,1,Na,N PLVAP_chr19:17462499_G->A,UTR3,1,1,0.998233,0,0.527228,1,0.991803,1,1,1,1,Na,1,1,1,1,1,1,1

The code is:

perl /programs/annovar/table_annovar.pl high_cov_unique_SNVs.anno_input \ /programs/annovar/humandb/ -buildver hg19 -out unique_snvs -remove \ -protocol refGene -operation g -nastring .

awk -v OFS="," '{n=split($7,a,",");name=a[1];for(i=2;i<=n;i++){name=name";"a[i]};print $1,$2,$3,$4,$5,"\""$6"\"","\""name"\"","\""$8"\"",$9,$10}' \ unique_snvs.hg19_multianno.txt > unique_snvs.hg19_multianno.new.csv

Now the program runs without an error so far.

Sincerly, Hecate

biomrpaul commented 8 years ago

Hi Hecate,

Thank you for noticing that the output of annovar needs to be specially processed to make sure there are no commas in the VAR IDS. I will include this in the documentation.

Thank you, Matt