Hi, thank you very much for this paper and codebase. Sorry if my question is super simple, but I was curious about the DMS_filename entries you referenced in DMS_substitutions.csv. I am trying my best to find the exact filenames you provided in the CSV file, but I can't find most of those DMS data where I would have mutation sequences for the referenced protein. For example, I am trying to look for CAS9_STRP1_Spencer_2017_positive.csv, where I would have thousands of mutated sequences along with the target sequence and possibly LFC after positive selection. When you mentioned those names in the file, where can we reference them to get them?
The reason why I am also asking is that I wanted to run the notebook code, but I need DMS_reference_file like those .csv documents. I know each of these reference papers has their data in supplementary. Still, I checked that the files are not named in the same manner, + I feel like, for example, you are choosing some specific criteria, like positive selection for Cas9, which might not be the only functionality criteria in the reference papers.
I could only find some .csv files from this link, but it seems to be outdated as there are way more DMS substitution files included in the ProteinGym paper.
Hi, thank you very much for this paper and codebase. Sorry if my question is super simple, but I was curious about the
DMS_filename
entries you referenced in DMS_substitutions.csv. I am trying my best to find the exact filenames you provided in the CSV file, but I can't find most of those DMS data where I would have mutation sequences for the referenced protein. For example, I am trying to look forCAS9_STRP1_Spencer_2017_positive.csv
, where I would have thousands of mutated sequences along with the target sequence and possibly LFC after positive selection. When you mentioned those names in the file, where can we reference them to get them?The reason why I am also asking is that I wanted to run the notebook code, but I need
DMS_reference_file
like those .csv documents. I know each of these reference papers has their data in supplementary. Still, I checked that the files are not named in the same manner, + I feel like, for example, you are choosing some specific criteria, like positive selection for Cas9, which might not be the only functionality criteria in the reference papers.