Improve save-to-network performance - Githubissues

NCEAS / morpho

Morpho metadata editor

GNU General Public License v2.0

3 stars 1 forks source link

Improve save-to-network performance #984

Closed mbjones closed 5 months ago

mbjones commented 6 years ago

Author Name: ben leinfelder (ben leinfelder) Original Redmine Issue: 5823, https://projects.ecoinformatics.org/ecoinfo/issues/5823 Original Date: 2013-01-24 Original Assignee: Jing Tao

Jing saved data packages with binary data entities from Morpho to a Metacat MN to test the performance of those save operations. There seems to be some overhead for this no matter how small the data file.

File size Total time (s) 1.4 K 10 395 k 10 1.3 M 15 5.8 M 26 8.8 M 33 50 M 115

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-01-25T07:38:28Z

I think in all cases we have the following calls: -check if EML file exists -generate new ID if so -check if data file exists -generate new ID if so -submit EML + SystemMetadata bytes -submit data + SystemMetadata bytes -submit ORE +SystemMetadata bytes

For each call we do a lot of local work locating and setting up the client certificate and CA truststore for the SSL connection - each time starting from scratch. This can't come cheap and I wonder if there is room for improvement there. Perhaps the DataONE client can maintain a bit more state than it currently is doing. The CA truststore won't be changing from call to call, whereas the client certificate could be.

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-01-25T21:21:56Z

With my NCEAS network connection I did a small and big data file: big (50 MB) -- 18.5 s small (119 bytes) -- 8.3 s

mbjones commented 6 years ago

Original Redmine Comment Author Name: Jing Tao (Jing Tao) Original Date: 2013-02-21T01:33:32Z

I did a test to save the almost same data packages (an eml document to describe the same data file) into the network and both locations.

It is interesting that the saving to the both location(10 seconds) took much less time than just saving to the network (14 seconds).

Here is some details:

It takes about 1 ~ 2 seconds to create a small size D1Object on the network. It takes about 0.2 second to get a network-generated id.

The saving process saved 3 objects (a resource map, an eml document and a data file) and it took about 5 to 6 seconds.

The reason why the save-both action took the less time i guess is that it reads from the local copies to display the new data package.. But the save-network-only process reads from the network. If we can add the saved D1Objects into the cache in the d1_libclient_java module, it will improve the process.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Redmine Admin (Redmine Admin) Original Date: 2013-03-27T21:31:47Z

Original Bugzilla ID was 5823