Closed meglecz closed 2 years ago
One question. For the "habitat" delete part, if a different cutoff other that 0.5 must be used, then do we need a parameter?
Another question. What does this mean N_i' ? Sum of all other variants in the same habitat h? Sum of variant i in the other habitats?
My other suggestions are the following. I propose that this tool writes a "delete occurrences" file without "keep occurrences". The "keep occurrences" file is a different file created by the user. The following "vtam optimize" command will take two options --occurrences_keep and --occurrences_delete.
One question. For the "habitat" delete part, if a different cutoff other that 0.5 must be used, then do we need a parameter? Ideally yes. We can call it min_habitat_proportion or habitat_proportion or habitat_p
_What does this mean Ni' ? Total number of reads on variant i in run-marker combination (Ni) minus the number of reads of variant i, where habitat is 'NA' (negative control).
_My other suggestions are the following. I propose that this tool writes a "delete occurrences" file without "keep occurrences". The "keep occurrences" file is a different file created by the user. The following "vtam optimize" command will take two options --occurrences_keep and --occurrencesdelete. That is a possibility, but generally you do not like adding new parameters. I would pefer keep and delete occurrences in the same file (known_occurrences.tsv), to keep the command as simple as possible.
When using several different mocks and negatives, plus samples from different habitats, the preparation of know_occurrences.tsv is fastidious.
It would be nice to prepare automatically a know_occurrences.tsv file when running filter. This file could be revised manually by the user afterwards, but it serves as a solid base.
Plan: run filter with two options --mock_composition: requires a tsv file in the format of know_occurrences.tsv, but only keep occurrences are listed in all mocks --sample_types: requires a tsv file as in the example bellow. All sample-run-marker combination should be listed
Marker Run Sample Sample_type habitat MFZR run1 tpos1_run1 mock terrestrial MFZR run1 tnegtag_run1 negatif NA MFZR run1 14ben01 real freshwater MFZR run1 14ben02 real freshwater
Based on these files prepare a known_occurrences.tsv with keep and delete occurrences as follows:
Keep occurrences: Copy of mock_composition.tsv
Delete Occurrences: