AdamaJava / adamajava

Other
14 stars 5 forks source link

input/output format issue on qbasepileup SNP mode #303

Closed ChristinaXu2017 closed 2 years ago

ChristinaXu2017 commented 2 years ago

**Describe the bug There was a option to specify the snp position file format and output format. -f Format of SNPs file [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA], Def=dcc1 and --of Output file format [rows,columns]. we doubt these description is incorrect or ambiguous:

To Reproduce

Here we tested "-f" and "--of" and three sets of SNP position file:

  1. SNP position file in txt format with tab delimiter eg. DPYD_97981343_A_C,T chr1 97981343 97981343 A C,T. Below five runs, two failed, three succeed and output the same in column format.
  1. We use same SNP position file as above but not use "--of" option, or other value, three runs failed, two succeed and output the same in row format.
    • -f txt --of any: run failed, throw an exception. The reason "txt" is not supported.
    • -f tab --of any: run succeed, but log file shows the input is "tab" format.
    • -f columns --of any: run succeeds, the log file shows the input is "columns" format.
    • -f maf --of any: run failed, throw an exception. The reason is the input is not MAF format.
    • -f vcf --of any: run failed, throw an exception. The reason is the input is not VCF format.

4.SNP position file in MAF format. Both runs succeed and output same.

  1. SNP position file in VCF formte
    • -f vcf --of columns: run failed, the log file shows the input and output are "columns" format. Because it is vcf but not columns.
    • -f vcf --of any: run succeeds, the log file shows the input is "vcf" format.

Expected behavior

  1. "--of columns" can only work for "column" format SNP file. The existing code always converts any format to "columns" if this option is specified.
  2. "tab" and "column" format are treated the same. it does not matter we specify the format as "tab" or "column" for the same snp position file but got the same output. "columns" should be removed to avoid unnecessary confusion. Also we make it consistent to coverage mode which takes "tab" but not "columns".
  3. "txt" is not supported; "columns" should be removed because the code treats it the same as "tab".
  4. SNP mode should only support "columns", "maf" and "vcf" format, here "dccq" and "dcc1" are deprecated

Desktop (please complete the following information):

Additional context Add any other context about the problem here.