MicroB3-IS / osd-analysis

Repository for all Ocean Sampling Day related source code with information on how-to acquire OSD data
Apache License 2.0
13 stars 7 forks source link

Wrong cluster reference sequence in otu file #27

Open ikostadi opened 8 years ago

ikostadi commented 8 years ago

Dear all,

unfortunately, a bug in the SILVAngs system has affected one of the files in the results directory of all projects analysed before September 22nd 2015 (that means all currently available OSD results calculated with SILVAngs). The affected file is the 'prjname---ssu_or_lsu---otus.csv' file in the 'exports' subdirectory of your results.

The sequence that is listed for each cluster / OTU in this file, is not the sequence of the reference of the cluster / OTU but is randomly selected from all sequences of that cluster. All other data in this file is correct.

All other sequence exports are not affected and contain the correct sequence for each cluster. The unaffected files include the ARB and FASTA files in the main results directory, as well as the FASTA exports in the 'exports/otu_references' subdirectory.

The bug has been fixed in the meantime and all results starting from September 22nd 2015 are okay.

gipsilim commented 8 years ago

Hello Ivo,

Just saw this post, I have used 2 files dating from before Sept 22. One of them is clearly affected: osd2014_18s_lgc_otu_by_sample.csv, but not sure about the other one: osd2014_EMG-SINA-SILVA-119.1_otu_by_sample.tsv Could you please tell me? best, Gipsi

ikostadi commented 8 years ago

Hi Gipsi,

the files you mentioned are not the type of files referred to in the original post (please compare naming scheme). The affected files contain a comma (or tab) separated information about all per-sample OTUs (in SILVA speak "local clusters"), whereas the file names you mention:

  1. do not contain any sequences in them, only (global) cluster (i.e. OTU) counts
  2. are generated in a way that is independent of the error in the *-otus.csv files

Therefore, they should not be affected in any way by the pipeline bug. If you have any doubt, please feel free to double-check and don't hesitate to ask again or notify is should you find anything suspicious.

Best, Ivo

gipsilim commented 8 years ago

Hi Ivo,

Thanks,

I thought one of them could be affected, the 18S: osd2014_18s_lgc_otu_by_sample.csv, true that it does not have same naming schema as prjname---ssu_or_lsu—otus.csv

anyway, better like this :)

cheers,

Gipsi

On 08 Feb 2016, at 17:11, Ivo notifications@github.com<mailto:notifications@github.com> wrote:

Hi Gipsi,

the files you mentioned are not the type of files referred to in the original post (please compare naming scheme). The affected files contain a comma (or tab) separated information about all per-sample OTUs (in SILVA speak "local clusters"), whereas the file names you mention:

  1. do not contain any sequences in them, only (global) cluster (i.e. OTU) counts
  2. are generated in a way that is independent of the error in the *-otus.csv files

Therefore, they should not be affected in any way by the pipeline bug. If you have any doubt, please feel free to double-check and don't hesitate to ask again or notify is should you find anything suspicious.

Best, Ivo

— Reply to this email directly or view it on GitHubhttps://github.com/MicroB3-IS/osd-analysis/issues/27#issuecomment-181422478.