NBISweden / IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads
https://www.igdiscover.se
MIT License
17 stars 10 forks source link

Be more helpful when validating the input database #46

Closed marcelm closed 7 years ago

marcelm commented 7 years ago

igdiscover init validates the input database and checks whether there are duplicate sequences.

The error we get complains only about the first duplicate sequence that it finds, so it is necessary to re-run init in order to fix and find all problems, which is annoying, especially if the GUI is used.

marcelm commented 7 years ago

@NestorVB I am going to fix this such that the input database is automatically repaired.

marcelm commented 7 years ago

It works as follows.

This is how it looks when running igdiscover init:

$ igdiscover init --library-name=testing --db=db-with-problems --reads=reads.1.fastq.gz test
INFO: Records 'VH2b' and 'VH2c' contain the same sequence, skipping 'VH2c'
INFO: Record 'empty_sequence' is empty, skipping it.
INFO: Record name 'same_name' occurs more than once, replaced with 'same_name_1'
INFO: Directory test initialized.
INFO: Edit test/igdiscover.yaml, then run "cd test && igdiscover run" to start the analysis