Closed BryanFauble closed 4 months ago
Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
synapseutils/monitor.py
:Line 149:89: E501 line too long (89 > 88 characters)
synapseutils/sync.py
:Line 710:89: E501 line too long (89 > 88 characters) Line 1193:89: E501 line too long (91 > 88 characters)
Line 206:89: E501 line too long (117 > 88 characters)
Just wanted to say that this overall logic looks great. If you happen to get a chain of dependencies like this:
flowchart TD A --> B B --> C C --> D D --> E
Then it will upload those entities in serial, so just another thing to think about when people are uploading with syncToSynapse, even if async, it can run into "sync" like situations. <- Thinking out loud here - is that correct?
@thomasyu888 Yes, the chain of dependencies needing to be stored sequentially is correct. If there are really large dependency graphs we can make a small enhancement if the referenced file Metadata already exists in Synapse. As the only thing the next node in the graph needs to know is what the Synapse ID of the outbound edge is. However, the code isn't set up to "return early" to handle for this situation. It might be too early to pre-optimize for this, but something to think about if there are real world scenarios that might be adversely affected performance wise.
Draft items left to complete:
Problem:
Solution:
asyncio.wait
statement to wait for the IDs of those files to be resolved. When those IDs are available it will then execute the save.Testing:
Assume the following file/dependency diagram. In this diagram we have to start at the nodes with no outbound edges and follow the graph backwards until all files are finally saved. The use of AsyncIO allows us to use futures/awaitables to handle this waiting logic.
Verified that the upload process progress bar looks good: