dmwm / DBS

CMS Dataset Bookkeeping Service
Apache License 2.0
7 stars 21 forks source link

Refactor codebase to use independent module to parse incoming HTTP requests #618

Open vkuznet opened 4 years ago

vkuznet commented 4 years ago

This PR tries to address issues with large memory footprint in DBS server, see full discussion in https://github.com/dmwm/DBS/issues/599

The code is refactored in the following way:

Due to dynamic nature of python memory allocation it is hard to evaluate an impact of particular format on long running DBSServer, but this PR will allow to easily switch and tests usage of different formats. But to do that the clients which will interact with DBS server will need to send data in proper format, e.g. in json_stream, such that we can measure memory footprint of DBS server in that case.

The provided convert2json_stream function allows to convert either given json (dict) object or file object which contains json data stream, e.g.

# example how to convert json to json_stream
from dbs.utils.parsers import convert2json_stream
import json
data={"data":1, "foo":[1,2,3]}
convert2json_stream(data)
# this will produce the following output
{
"foo"
:
[1
, 2
, 3
]
,
"data"
:
1
}
# if you want to write this output to output file you will do
obj= open('YOUR_FILE_NAME', 'w')
convert2json_stream(data, obj)

# similar if you do have file object which contains json stream you may use it
fobj = open('YOUR_FILE.json')
convert2json_stream(fobj)

# similarly I provide convert2yaml function which can convert given JSON to YAML
data={"data":1, "foo":[1,2,3]}
print(convert2json_stream(data))
data: 1
foo:
- 1
- 2
- 3

With this module we can perform various test on DBS server using different input data formats.

yuyiguo commented 4 years ago

Valentin,

I will look into this after I have DBS partition is done. It may take a few months.