Closed ypriverol closed 3 years ago
The issue is resolved, commit 9c067b0
I have made it more generic by adding a new variable named filter_column
to assign a column name to be used for filtering. The mutations in the file can thus be filtered or split based on any column. Also, I have renamed the tissue_type
and split_by_tissue parameters
to accepted_values
and split_by_filter_column
, respectively, to align them with the current generic form.
Also, there was an issue with having duplicate keys in the fasta headers fro COSMIC. By definintion, fasta files should have unique record IDs to make parsing work, however, to overcome such issue I have re-implenteing the parsing function to make it work with multiple entries of the same gene.
@husensofteng :
I have implemented the download of the Cosmic cell-lines mutations file (https://github.com/bigbio/py-pgatk/commit/03fccf450ce736949e2f41a3da01d939cab5f69b). It would be great if we can implement:
Sample name
which is the cancer cell line used. This can be used in the same way that tissue filter in the tumor mutations file.