Closed Ninjani closed 4 years ago
@all-contributors please add @Ninjani for code
@gtauriello
I've put up a pull request to add @Ninjani! :tada:
@all-contributors please add @akdel for content
@gtauriello
I've put up a pull request to add @akdel! :tada:
Thanks for the code so far. For the annotation mapping: do you plan to have a list of PDB IDs with offsets for those mappings? Also would it be possible to keep a (possibly rounded) representation of the numeric value in the annotation text? This would imprive the browsing through the results...
I will keep this PR open so you can just easily keep pushing to your fork and we can comment on changes here...
Yes we were planning to have lists of PDB IDs mapped to the corresponding sequence, as well as pre-defined groups of PDB IDs that make sense to compare (e.g. proteins from SARS-Cov2 with different ligands, or proteins from SARS-Cov2 with SARS-Cov), will do this tonight. Good idea about including the numbers in the text, we'll add that. Thanks!
The code is quite usable now. Here's a list describing the current functionality:
Extract experimental PDBs associated with a given Uniprot ID and protein name. For instance, this let's you obtain all the PDB ids which are related to the 3C-like proteinase in P0DTD1 (Replicase polyprotein 1a), see example_usage.py
. You can also retrieve PDB IDs based on available annotations on the UniProt sequence. The code maps the PDB residue index to the reference UniProt index (using muscle to align, maybe there's a better way?).
Using above functionality, it's possible to retrieve a relevant set of PDBs and create an ensemble of structures. The ensemble can be used to compute:
For a single PDB ID, we predict (according to elastic network models)
Points 2 and 3 can be written out as structure annotation and visualized in the beta SWISS-MODEL annotation website (see example_usage.py
and the annotations folder for examples).
For the code: could you add a little README to the folder to describe the functionality (and also otherwise check if anything else is missing according to the contribution guidelines)? It looks like great work. Thanks.
Also if there are useful reusable components in here for other PRs, we could move those to the utils folder and do a separate PR to merge them into master?
On SIFTS mappings: if they are correct you shouldn't have to run any alignments with muscle but can use the residue numbering and just use the difference between "start" and "unp_start" as offset. So basically a continuous stretch in the PDB-seqres (which is what the residue numbering in mmCIF files is linked to) should match a continuous stretch in UniProt-sequence. Unfortunately, the data in "best_structures" SIFTS-API (which you use) doesn't handle the case where a continuous PDB-stretch maps to a discontinuous stretch in UniProt. As far as I can see for cov-structures, this only happens in 6lxt which we have blacklisted for our cov-page (also for various other reasons). The per-pdb SIFTS-API does handle such cases (see here for 6lxt and our display in SMR which uses little 'T' symbols to mark jumps in sequence in the alignment).
In terms of PDB structures on our cov-page vs SIFTS I noticed the following:
We added a README. I think the parse_pdbe.py
would fit better in utils as it can be used to obtain a dictionary mapping from a PDB ID to the UniProt reference sequence. Will make another pull request for this.
About the mapping, we did start out by trying to directly use the given offsets but it didn't work for some proteins. e.g. for 6yb7 chain A, the information from PDBe is:
{'end': 306,
'chain_id': 'A',
'pdb_id': '6yb7',
'start': 1,
'unp_end': 3569,
'coverage': 0.043,
'unp_start': 3264,
'resolution': 1.25,
'experimental_method': 'X-ray diffraction',
'tax_id': 2697049}
pd.parseCIF("6yb7", chain="A").select(f"resnum 1:306").select("calpha").getSequence()
'SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTF'
Compared to the reference (discrepancies in bold):
mapper.uniprot_sequence[3264 - 1: 3569]
'SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ'
So we decided to play it safe and make an alignment after selecting by the given offsets. We could also remove this and throw an error in these cases of course, what do you think?
Ok so analyzing pd.parseCIF("6yb7", chain="A").select(f"resnum 1:306").select("calpha").getSequence()
I see two issues:
So in general I don't think it's a good idea to map residue-indices to UniProt-sequence-positions but rather to map residue-numbers to UniProt-sequence-positions.
And to be honest there are way more special cases to deal with when generalizing SIFTS mapping readings. I am doing some simplifications here to make this easier, but the only clean data for the mapping is parsing the xml files from SIFTS (see links here). That's what we use for the mappings in our repository. In the xml each residue is listed separately and so you can generate a clean map of residue-number to UniProt-sequence-positions.
For point 2: the PDB seqres can have mismatches compared to the UniProt-sequences. That's ok and can happen for various reasons of generation of the structures. These are annotated as "conflict" in the xml and you can also spot the on PDBe-KB. This also means that at those positions you probably shouldn't map any annotations where sidechains are of relevance. So it's good that you spot them but it's expected to happen... As an example 6vyb has conflicts at UniProt-positions 986 & 987 (PDD-resnums 1005 & 1006): see in links to PDBe-KB and SMR.
Okay finally figured out the root of all the issues was that PDBe numbering was sometimes different from PDB/CIF resnum numbering. e.g.
PDB CHAIN SP_PRIMARY RES_BEG RES_END PDB_BEG PDB_END SP_BEG SP_END
6w01 A P0DTD1 25 370 1 346 6452 6797
where the PDBe start is 25 but the PDB resnum start is 1.
Fixed this now by using the uniprot_segments_observed.tsv.gz
file on the SIFTS page that maps from PDBe to PDB to UniProt. Also made another pull request (#38 ) to add this PDBe mapping script to utils.
@Ninjani Using the PDB resnum is probably a bad idea. PDB entries in mmCIF format have two res. numbers. The historical "PDB resnum" is the "author provided" res. numbers which were used in PDB format. They are fairly arbitrary since they are "author provided". In mmCIF you have both "author provided" (labeled "auth_seq_id") and the new and seqres-linked ones (labeled "label_seq_id"). The latter is preferred since it has consistency between data stored in the PDB entries (also that's what is referred in the SIFTS files as RES_BEG and RES_END). Often enough the two are reasonably consistent (or just moved by an offset) but there is no guarantee whatsoever that you have any type of continuous numbering in those "PDB resnum".
I hope that Prody reads also the seqres-linked res. numbers as they should do for proper parsing of mmCIF files. If they do, please don't use the PDB resnum but use the seqres-linked res. numbers instead.
And I would still argue that it's best to parse the xml file (which if really really necessary also includes the PDBresnum).
@gtauriello @Ninjani
Using the master branch from Ninjani's fork, I have incorporated the solvent accessibility code into the single structure annotation code. It creates a text annotations file for upload to SWISS MODEL as well as a single CSV of all annotation values. A CSV file is not currently created for the ensemble annotations. I was uncertain about testing the upload, but happy to try if it won't cause problems.
I've also added a conda env file and accessory scripts, although the env file could probably use some cleanup.
My code is in the branch mlgill/08_surface_access_interface. I will be offline for much of the remainder of the day but happy to fix any issues later this weekend.
@mlgill No worries about uploading the annotations. Seems to work fine (used your Relative_Solvent_Accessibility.txt): https://beta.swissmodel.expasy.org/repository/covid_annotation_project/WLWmmq
I have some questions on what I am looking at though (might need more text in the README):
@gtauriello Good catch -- I'd forgotten that DSSP will use all atoms in the PDB file to calculate solvent accessibility, regardless of which ones are selected in ProDy. I've updated the code so that it will create a new PDB file for DSSP if a subset of atoms are selected. Thus, the results returned will correspond to solvent accessibility of the selected atoms, assuming ONLY the selected atoms are present. I have also updated the readme.
Another alternative would be to calculate two sets of solvent accessibilities (one with all atoms and one with only the selections). However, I felt that was an unnecessary level of complexity since it's possible to achieve the same result by rerunning the calculation without a chain selection.
@mlgill Could you elaborate (also in the README) on what the "selected atoms" are? I suppose with "selected atoms" you mean that you select all atoms in a single chain which would be the "solvent accessibility of a single chain in isolation"?
I want to stress that in the case of solvent accessibility (and possibly also other annotations) for proteins which are known to be found in oligomeric complexes (as is the case for nsp7) it is is very important to consider both the protein chain in isolation and the protein complex. Proteins are rarely in isolation in cells. So if we focus too much on chains in isolation we risk to provide results which are of limited use...
Also for the README (for all annotations) it is important to describe what we find in the files in the annotations-subfolder. Currently for instance it's hard to understand what PDB files were used to compute those annotations. Basically it should be possible (at least for someone like me) to skip the code, look at the available annotations and know what I am looking at only using the information in the README...
@gtauriello Thanks for the feedback. I will work on the updates when I am able to. In the meantime, could you please elaborate on the following:
Also I still don't see a comment for the Relative_Solvent_Accessibility.txt file stating which PDB file was used to get those annotations. If I skip the code and look at the available annotations, I should be able to know what I am looking at only using the information in the README...
Happy to add this. I'd note that this is also missing from the output files @Ninjani created (e.g. here) -- I was following his format since I don't have a lot of experience with SWISS-MODEL. Accordingly, I'm going to assume that the output files from the methods he implemented should also be updated.
Is there documentation somewhere on the comment format for the annotations file? I did not see anything here.
I want to stress that in the case of solvent accessibility (and possibly also other annotations) for proteins which are known to be found in oligomeric complexes (as is the case for nsp7) it is is very important to consider both the protein chain in isolation and the protein complex. Proteins are rarely in isolation in cells. So if we focus too much on chains in isolation we risk to provide results which are meaningless for practical purposes...
I understand the concern regarding monomers vs complexes. However, this is an issue where there is unlikely to be a one-size-fits-all solution, at least not without adding a lot of complexity to the code and thus reducing usability by the target audience.
As I mentioned above, I see two potential straightforward solutions to the issue of monomer / oligomers:
Are either of these solutions acceptable? If not, could you please elaborate on what it is you'd like to see? As it currently stands, I am left guessing at what it is you're looking for.
First of all: sorry for the original version of my comment sounding a bit overly harsh. I noticed after I clicked on "comment" and edited it... ;-)
Also I generalized my comment on annotations as I completely agree that this is missing for all annotations. I would use the README to explain what can be found in those files. We didn't foresee that the annotations would have comments and hence I don't think it is possible to add the comments there. But I think it's perfectly fine to have the info in a README next to those files...
If the current code can be easily run without a chain selection and do the surface accessibility for the whole complex that's fine I think. Thanks for elaborating on it. Would be good to describe it in the README (e.g. "we have precomputed results for chain in isolation but if you run the code like ... you get the results for the protein complex"). And sorry if I don't have too much time to look at the code and what it does exactly. I am mostly judging the visible output and documentations.
@gtauriello No problem at all -- these are the challenges associated with hackathons, especially distributed ones. :D
Your additional comments help clarify what you're looking for -- thank you! I'll make the fixes as I understand them later today and let you know when they're finished.
@mlgill @gtauriello Just wanted to add that one option we used for including the relevant PDB name is to add it to the title while posting the annotation to the SWISS-MODEL website (see here). This shows up when clicking the generated link for the annotation but is not shown on the page with the structure. Will add a README too as suggested.
@Ninjani That's a good point. I want to add code to create an information output file -- is this what you meant by adding a "README", since we already have one in our str_annotation directory?
Also, I made some updates to the README in our directory here, including adding the names of the output files for each of the features. However, I'm not clear on which file contains the following: RMSD of each structure to the reference structure (the first PDB ID in list)
. Could you double check these file names and let me know what's missing?
Finally, I've been pulling your branch into mine (https://github.com/mlgill/covid-19-Annotations-on-Structures/blob/mlgill/08_surface_access_interface), but I have additional code that I've committed. If we use this MR, I'll either need to push my code to your branch (if you'll give me access) or I'll need to open a separate one. Let me know which you'd prefer.
Personally I would prefer if you merge the two forks and have both changes in this PR. So that would either require access for @mlgill in the master branch of @Ninjani fork or a PR of the mlgill-branch into the Ninjani-fork. Probably the former is easier...
Thanks for the README-updates. I think it's going in the right direction of clarifying the content of those folders...
Agreed, I think it makes sense to have everything here so it's with the discussion. @Ninjani Let me know your preference. I'll pick this up again tonight when I've gotten a few other things done.
Thanks for the discussion, all. Have really enjoyed this and learned a lot.
@Ninjani That's a good point. I want to add code to create an information output file -- is this what you meant by adding a "README", since we already have one in our str_annotation directory?
Yes this is what I meant, a file that just maps each output annotation filename to whatever additional information (PDB ID, other parameters etc.) was used to create it.
Also, I made some updates to the README in our directory here, including adding the names of the output files for each of the features. However, I'm not clear on which file contains the following: RMSD of each structure to the reference structure (the first PDB ID in list). Could you double check these file names and let me know what's missing?
Ah you're right, I just realized I never wrote this information out to an annotation file because I was unsure how to annotate this. It applies to the entire reference structure as each RMSD value is calculated across the full structure but compared to each other structure in the ensemble - was not sure whether to make one file per structure in the ensemble or just a list of (PDB ID, RMSD) pairs attached to (all the residues in) the reference structure. Thoughts?
Finally, I've been pulling your branch into mine (https://github.com/mlgill/covid-19-Annotations-on-Structures/blob/mlgill/08_surface_access_interface), but I have additional code that I've committed. If we use this MR, I'll either need to push my code to your branch (if you'll give me access) or I'll need to open a separate one. Let me know which you'd prefer.
I've given you access to my fork now! I agree a merge seems best.
Thanks a lot for everything, this has been great! I'll try to finish up the few remaining tasks associated with this tonight or tomorrow.
@Ninjani Thanks for the updates and for giving me access to your fork. I'll push my code there when I return to this tonight.
For the missing RMSD data, perhaps it's best to diverge from the format imposed by SWISS MODEL since this is unlikely to be an annotation. A csv could be created with the upper triangle of a matrix to store the pairwise RMSD values (assuming I understood correctly what you want)? Thoughts?
@mlgill Yes that makes a lot of sense, I'll do that. Thanks!
@mlgill the .vscode folder looks like local settings for you. Would it be ok for you to add it to .gitignore and remove the folder from the git repository?
@Ninjani @gtauriello Just a heads up that I am trying to fix Ninjani's master branch so that all my changes are squashed into a single commit at the end. The way that I handled the push previously did not do this and I don't like the results for the sake of reproducibility. I won't delete any branches and am testing on other branches first.
@Ninjani Using the latest version of your master, which is currently in the branch master_NINJANI
due to code cleanup, I'm encountering the following error when I run the ensemble demo.
Unfortunately, this is one of the reasons I started the branch clean-up -- I thought that perhaps I'd introduced the issue somewhere and didn't catch it. Any thoughts on what might be causing this? Thanks
Exception has occurred: KeyError
('6y2g', 'B')
File "/home/mgill/code/covid-19-NINJANI/str_derived_annotations/parse_pdbe.py", line 132, in map_to_pdb
pdbe_mapping = self.all_residue_mapping[(pdb_id, chain_id)]
File "/home/mgill/code/covid-19-NINJANI/str_derived_annotations/example_usage.py", line 49, in example_ensemble
residue_mapping, _ = reference_mapper.map_to_pdb(pdb_info_list[0])
File "/home/mgill/code/covid-19-NINJANI/str_derived_annotations/example_usage.py", line 99, in <module>
example_ensemble()
@Ninjani One more issue that should probably be addressed. I think it would be good to add code that at least checks for the existence of uniprot_segments_observed.tsv
and either downloads it automatically or tells users where to download it and put it in the code. What do you think?
@gtauriello @Ninjani
I believe I've finished fixing the branch issues. The following is an inventory of branches:
master_MICHELLE
from below. I think there might be a way to force the commit history to be rewritten by that from another branch, but I'd need to investigate further.Ninjani, would you like to handle the switch over? Or grant me the ability to change/delete master?
One more issue that should probably be addressed. I think it would be good to add code that at least checks for the existence of uniprot_segments_observed.tsv and either downloads it automatically or tells users where to download it and put it in the code. What do you think?
@mlgill So this part still needs to be rewritten to use the XML files from PDBe instead of this file (see this comment thread) but I haven't yet found time to do this. I'll try again to find some time today, and once it's done we can go ahead with pull request #38 which moves the parse_pdbe script to utils. Then we'll have to refactor a couple of lines in the other files to import this script from utils instead of the current location.
I believe I've finished fixing the branch issues. The following is an inventory of branches:
I think I've granted the ability to change master. Let me know if this isn't the case, I'm still not great at figuring out github options.
Hi @Ninjani
@mlgill So this part still needs to be rewritten to use the XML files from PDBe instead of this file (see this comment thread) but I haven't yet found time to do this. I'll try again to find some time today, and once it's done we can go ahead with pull request #38 which moves the parse_pdbe script to utils. Then we'll have to refactor a couple of lines in the other files to import this script from utils instead of the current location.
Apologies, I missed this discussion in that from earlier -- I was just noting issues that I'd seen, in addition to the other bug with PDB mapping that I mentioned above.
I think I've granted the ability to change master. Let me know if this isn't the case, I'm still not great at figuring out github options.
Unfortunately, it doesn't look like I can make the necessary changes -- I can't see the Settings panel shown in the first picture here. Have you given me admin
privileges, as is described here?
@gtauriello Hi, following up on your feedback:
I generalized my comment on annotations as I completely agree that this is missing for all annotations. I would use the README to explain what can be found in those files. We didn't foresee that the annotations would have comments and hence I don't think it is possible to add the comments there. But I think it's perfectly fine to have the info in a README next to those files...
If the current code can be easily run without a chain selection and do the surface accessibility for the whole complex that's fine I think. Thanks for elaborating on it. Would be good to describe it in the README (e.g. "we have precomputed results for chain in isolation but if you run the code like ... you get the results for the protein complex"). And sorry if I don't have too much time to look at the code and what it does exactly. I am mostly judging the visible output and documentations.
I have completed the code improvements for the solvent accessibility calculation -- by default, they use the entire PDB for for calculation, but this can also be changed so that only the chain from the residue mapping is exported. In both cases, only the residues from the residue mapper are exported so that all files are uniform.
I've added quite a bit of explanation to the README as well as a file that describes the calculation e.g. here.
Note that my code is currently in this branch while I wait for admin access to @Ninjani 's repo so I can make this branch master.
Please let me know of any additional feedback -- otherwise I will look through the report draft in-progress to determine what is needed from me.
@Ninjani Using the latest version of your master, which is currently in the branch master_NINJANI due to code cleanup, I'm encountering the following error when I run the ensemble demo. Unfortunately, this is one of the reasons I started the branch clean-up -- I thought that perhaps I'd introduced the issue somewhere and didn't catch it. Any thoughts on what might be causing this? Thanks
Ah I missed this comment, I'll take a look! It could be that the TSV file doesn't have some PDB IDs; yet another reason to switch to SEQRES and XML. I'm still trying to figure this out as I don't think ProDy stores the SEQRES identifiers, but haven't been able to spend a lot of time on it.
Unfortunately, it doesn't look like I can make the necessary changes -- I can't see the Settings panel shown in the first picture here. Have you given me
admin
privileges, as is described here?
@mlgill Hmm, I'm afraid this option is only available to organizational accounts while mine is a personal account. Maybe you could walk me through the steps I'd need to take to make the necessary changes?
@mlgill and @Ninjani for the branches: If the plan is to trash Ninjani:master and replace it with Ninjani:master_MICHELLE, we might as well do a new PR. Either way that assumes that there are no relevant changes in master compared to master_MICHELLE. Is that correct?
If you want to keep this PR and replace master, the commands to execute are these:
git checkout master_MICHELLE
git pull --rebase
git checkout master
git reset --hard master_MICHELLE
git push origin master --force
Note that this changes the commit history of master and hence might break any work done by anyone else with those checkouts. Hence, it might be a better option to just close this PR and start a new one from master_MICHELLE. It's usually anyways better to do work in branches also when forking, so that you can safely force-reset master to be in sync with the main fork...
@mlgill and @Ninjani for the branches: If the plan is to trash Ninjani:master and replace it with Ninjani:master_MICHELLE, we might as well do a new PR. Either way that assumes that there are no relevant changes in master compared to master_MICHELLE. Is that correct?
If you want to keep this PR and replace master, the commands to execute are these:
git checkout master_MICHELLE git pull --rebase git checkout master git reset --hard master_MICHELLE git push origin master --force
Note that this changes the commit history of master and hence might break any work done by anyone else with those checkouts. Hence, it might be a better option to just close this PR and start a new one from master_MICHELLE. It's usually anyways better to do work in branches also when forking, so that you can safely force-reset master to be in sync with the main fork...
Thanks for the nudge @gtauriello and for the reply earlier @Ninjani. I have confirmed that master_MICHELLE
is up to date via a directory level diff. I also created a backup branch of master
as it currently stands as a precaution.
@gtauriello, your command sequence seems to have worked -- thank you! I tried this sequence previously but I think I did not rebase when pulling. It now appears that Ninjani:master
is up to date and the commit history is correct. I've also done another directory level diff on the files and they match master_MICHELLE
.
Given that this works, do we want to leave this PR open? It does have some important discussion about open items that I believe remain on @Ninjani 's portion of the code:
Does this sound correct to everyone? @Ninjani, maybe you could provide an estimate of when you might be able to address the first item.
If we want to trash this PR, then I'm happy to close it and create a new one with the remaining issues above. Just let me know.
@mlgill Great that the merge worked! I'm sorry for the delay on my part with addressing the XML issue, some deadlines caught up with me. I'll finish it up this weekend and double check that the bug you found is resolved. I think it makes sense to keep these issues in this PR with the relevant discussion about them, but up to you!
@Ninjani Glad that all appears good on your end -- I agree, let's keep things in this PR. Please ping me here if you need anything this weekend. Happy to test that the bug is fixed for the third item and ensure there are no other regressions.
@mlgill We wrote the XML parsing code and switched everything to use that - it does seem better right away since it handles missing residues. The changes are in the sift_xml_parser branch. I've tested it with the example_usage and everything runs fine. Also, I believe that the bug you noticed with ('6y2g', 'B') is also fixed - I could obtain mappings for that pdb_id, chain_id combination with the new parse_pdbe.get_pdb_to_uniprot_mapping
function. Could you take a look to make sure we didn't break anything else? If everything's good we can move forward with factoring out the parse_pdbe.py
file to utils and then merge both pull requests.
@Ninjani confirmed that everything works for me! My only (optional) suggestion is to remove the cif files after they are downloaded. Up to you.
Let me know if I can help further with code testing or refactoring. Can often make time on the weekend.
@mlgill Thanks for checking! Would indeed love some help with refactoring as I want to make sure the commits and pull requests are correctly implemented. Here's what I had in mind:
utils
branch without the str_derived_annotations folder and with parse_pdbe.py
in the utils folderparse_pdbe
from utilsDoes this make sense?
Hi @Ninjani,
I read through your suggestions. I'm not clear why str_derived_annotations
should be completely removed from the branch submitted for #38. Maybe you could explain further?
In the meantime, I've done the following as an intermediate check:
sift_xml_parser
, called refactor_parse_pdbe
on your fork of the repoparse_pdbe.py
was moved into the utils
directoryexample_usage.py
were updated to account for this moveIf you're ok with this, I'm happy to merge this branch into master on your repo. Then we can reference it in #38 and here, merge with master here, and close both PRs.
What do you think of this as a solution? Anything I've misunderstood?
@mlgill That sounds good! I guess I was thinking in terms of having the utils PR predate this one so that we could import from utils here but on second thought that sounds pretty confusing. Thanks for everything and please go ahead! Let me know if I should do something else, this has been quite exciting and I've learned a lot.
Yeah I agree that it's easier if we procees as @mlgill suggested and I simply close #38 referencing to the changes done in here. Thanks to both of you for all your efforts.
Ok, I've merged refactor_parse_pdbe
in master on Ninjani:master
. We can now merge that with master on this repo and close this PR. Have also made a note on #38 that we can close it. Let me know if something else is needed to finish this out.
@mlgill @Ninjani Just one thing: is example_usage_ORIG.py still needed? Seems like a backup of example_usage.py...
Also: I would preferably "squash and merge" it into the main master (i.e. just one commit for the whole PR). I hope that's ok for you. I can try to put together a summary commit text but if any of you feels inspired to write a short summary, that would be great of course...
@gtauriello Good catch -- that was indeed a stray backup file that I accidentally merged and forgot to delete. I have fixed it.
@Ninjani Would you like to do the honors of writing a summary paragraph since you created the foundation for our methods?
Script to generate structure and elastic network model-derived annotations from PDB files