dmnfarrell / smallrnaseq

small rna-seq analysis package
GNU General Public License v3.0
29 stars 19 forks source link

KeyError: "['sample_col' 'factor_col'] not in index" #1

Closed sateeshperi closed 5 years ago

sateeshperi commented 5 years ago

I am trying to run the smallrnaseq pipeline for DE. Below are my files:

1) head rna_counts.csv

name,ref,Index31s_CP_C,Index32s_CP_C,Index34s_CP_V,Index35s_CP_V,Index31s_CP_C_norm,Index32s_CP_C_norm,Index34s_CP_V_norm,Index35s_CP_V_norm,total_reads,mean_norm
URS000059E1FE_10116,rattus_piRNA,207622.0,190965.0,239008.0,318060.0,189478.31,130573.11,198559.29,237604.68,955655.0,189053.84749999997
URS00003CC5C5_10116,rattus_piRNA,150770.0,413600.0,87051.0,222064.0,137594.5,282800.71,72318.85,165891.48,873485.0,164651.385
URS0000444A5C_10116,rattus_piRNA,236086.0,184428.0,216096.0,110362.0,215454.9,126103.41,179524.82,82445.22,746972.0,150882.0875

2) cat metadata.txt:

sample_s    group_s replicate
Index31s_CP_C   control s1
Index32s_CP_C   control s2
Index34s_CP_V   vinclo  s1
Index35s_CP_V   vinclo  s2

3) config file:

[base]
filenames = Index31s_CP_C.fastq,Index32s_CP_C.fastq,Index34s_CP_V.fastq,Index35s_CP_V.fastq
path =
overwrite = 0
adapter =  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
index_path = indexes
libraries = rattus_miRNA,rattus_piRNA
ref_fasta = genome.fa
features = rn6_genes.gtf
output = results
add_labels = 0
aligner = bowtie
mirna = 0
species = rno
pad5 = 3
pad3 = 5
verbose = 1
cpus = 8

[aligner]
default_params = -v 1 --best
mirna_params = -v 1 -a --best --strata --norc

[de]
sample_labels = metadata.txt
sep = tab #tab delimiter
count_file = rna_counts.csv
sample_col = sample_s
factors_col = group_s
conditions = control,vinclo
logfc_cutoff = 1.5

Now, the error I keep getting:

running differential expression
/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py:334: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  labels = pd.read_csv(labelsfile, sep=sep)
using these labels:
Traceback (most recent call last):
  File "/home/sateeshp/.local/bin/smallrnaseq", line 11, in <module>
    sys.exit(main())
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 484, in main
    diff_expression(options)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 343, in diff_expression
    print (labels[[samplecol, factorcol]].sort_values(factorcol))
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer
    .format(mask=objarr[mask]))
KeyError: "['sample_s' 'group_s'] not in index"
dmnfarrell commented 5 years ago

Your counts file appears to have no column names? It should be of the form indicated in the docs, with a name column and the sample columns corresponding to those in your metadata.txt file. eg:

name,type,Index31s_CP_C,Index32s_CP_C,Index34s_CP_V,Index35s_CP_V
URS000059E1FE_10116,rattus_piRNA,207622.0,190965.0,239008.0,318060.0,189478.31,130573.11,198559.29,237604.68,955655.0,189053.84749999997
..
sateeshperi commented 5 years ago

I just updated my pandas module in python and it works now. so I'm guessing the sample names in my counts file does match with those I mentioned in the metadata.txt