bigbio / py-pgatk

Python tools for proteogenomics analysis toolkit
Apache License 2.0
10 stars 11 forks source link

Cosmic mutations for celllines #28

Closed ypriverol closed 3 years ago

ypriverol commented 3 years ago

@husensofteng :

I have implemented the download of the Cosmic cell-lines mutations file (https://github.com/bigbio/py-pgatk/commit/03fccf450ce736949e2f41a3da01d939cab5f69b). It would be great if we can implement:

husensofteng commented 3 years ago

The issue is resolved, commit 9c067b0

I have made it more generic by adding a new variable named filter_column to assign a column name to be used for filtering. The mutations in the file can thus be filtered or split based on any column. Also, I have renamed the tissue_type and split_by_tissue parameters to accepted_values and split_by_filter_column, respectively, to align them with the current generic form.

Also, there was an issue with having duplicate keys in the fasta headers fro COSMIC. By definintion, fasta files should have unique record IDs to make parsing work, however, to overcome such issue I have re-implenteing the parsing function to make it work with multiple entries of the same gene.