How long does Drugport pasre when we are running this step

sunjrs commented 7 years ago

if the net is unstable, always running or interrupt. But we want to quickly acquire drug-parsing data, do you have better methods to solve time ? we try to acquire drug and pdb files data from reptile ,but it require about 18G space, if we change drug and pdb , could need download repeatedly , do you have better methods to solve the problem of downloading repeatedly ? Thank you

AdamDS commented 7 years ago

Currently, the drugport step retrieves the summary page for all drugs, then iterates over each drug to get the PDB information for the drugs. This requires getting the webpage for each drug. I assume this is the problem that you're running into. We have been working on improving our accession methods, but we do not have an updated solution just yet. For the PDB files, if you have files locally, then HotSpot3D will use those before trying to acquire them from the web. This is in cases where local storage for PDB files is limited; however, like you've pointed out, it requires downloading. Once preprocessed however, there shouldn't be a need to reprocess a PDB, since the preprocessing step calpro produces pair distances for peptides and other molecules. Perhaps a more immediate solution could be to try to get web data, if failed, then wait a second or so to try again. We could have it attempt 5 times or so before warning about the problematic web request, then move on. What do you think? Would that address the problem that you're encountering?

AdamDS commented 7 years ago

@sunjrs I think I have a solution for you. I'm uploading preprocessing data for GRCh37, Ensembl release 74 to Synapse. I'm working on a description for the content and download process. The main benefit is that while downloading is still necessary for the preprocessing data, the content is in the form to immediately begin analysis at the search step. I'm putting proximity files compressed individually so that you may choose which protein data to download. Being in individual files should be small enough (typically ~100KB) to transfer with spotty internet connections. The drugport parsing data (<400KB) will also be available. Would this help you with the problems you're experiencing? When I have completed the upload and help description, I will make the site available to the public, and I will let you know.

sunjrs commented 7 years ago

Thank you for your reply,on one hand,if we can choose datas of protein to download, at least the drug parsing step will be easier and quicker,not as reptiling data before,from that specified html page, the drugport parsing step loses some extendibility,if we can help with it by ourselves,it'll be exciting and it makes progress a lot! on the other hand,it's easy to notice the .target files empty, it's to say,the step to search target drugs has a lot to develop. if we know about pdb id, we will want to find the correponding targets drug . since we had the cluster result, to find more target drugs is necessary in the same cluster,yes,it's to say,the proceduer should tell us the target drugs to deal with residues in the same cluster.So, now I am eager to get the target drugs file, not empty,thanks for your effort!

AdamDS commented 7 years ago

@sunjrs We finished uploading preprocessing data and drugport pair data to Synapse here: https://www.synapse.org/#!Synapse:syn8699796 On the page, go to the Files tab, then follow the directories to the data of interest. In the leaf directories, you will find proximity data for each protein that could be processed, so I hope you are able to find suitable data for your purposes. Additional instructions are now in the README.

ding-lab / hotspot3d

How long does Drugport pasre when we are running this step #16