cms-PdmV / cmsPdmV

CERN CMS McM repository
4 stars 10 forks source link

McM data-service returns malformed output (string object instead of valid JSON) #890

Closed vkuznet closed 6 years ago

vkuznet commented 6 years ago

Hi, it seems to me that McM data-service is no longer return proper JSON documents. I was contacted why DAS mcm queries fail and I found the following.

If you look at output of this call (which DAS does):

https://cms-pdmv.cern.ch/mcm/public/restapi/requests/produces/ADDGravToLL_LambdaT-10000_M-1700_13TeV-pythia8/RunI
ISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

you'll see that it returns a string object rather then json dump.

To understand this here is how json dumps looks like in python when someone properly writes it to a file:

with open("foo.json", 'w') as ostream:
   ostream.write(json.dumps({'foo':1}))

The content of foo.json looks like:

cat foo.json
{"foo": 1}

So, there is no quotes around curled brackets. The McM output now is a pure string object which json parser fails to read as a dictionary.

I urge proper people to fix the problem since all users are affected by it.

Best, Valentin.

anorkus commented 6 years ago

Hi,

after big migration to new high performance python framework the default return on all API's is automatically packed to string. A few API's were missed where the return object was manually packed with json.dumps.

Any case it was returning a proper json data (json dumps returns string object), it was just packed twice.

vkuznet commented 6 years ago

Antanas, may I ask what high performance python framework you choose and why such choice was made? Valentin.

On 0, Antanas Norkus notifications@github.com wrote:

Hi,

after big migration to new high performance python framework the default return on all API's is automatically packed to string. A few API's were missed where the return object was manually packed with json.dumps.

Any case it was returning a proper json data (json dumps returns string object), it was just packed twice.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cms-PdmV/cmsPdmV/issues/890#issuecomment-349598322

anorkus commented 6 years ago

Hi,

we went with flask microframework. We were doing this for any new web service, cherrypy was only used on McM (which is main service for us).

Technically we could go even with bottle as we do not use HTML templates (its same as flask, just without template engine and single file). Cherrypy is thread based which means its slow with parallel unique requests and consumes a lot of memory. In our case cherrypy would increase to 1GB ram within hour (and keep it for forever). Flask didn't consumed more than 900MB in weeks time. Flask is event based.

Flask has explicit routing, which is nice as you always know the correct url. This caused some issues with unified scripts where url was actually malformed and cherrypy would still succeed.

McM service still under-performs due to early design choices (python memory locks), which limits us to single process. In my mind we would go with multiprocess flask (uwsgi app) + ngnix. Although Flask's werkzeug dev webserver is handling usage fine.

For us cherrypy started slowing with heavier load than 30/40 requests/sec. Flask runs smoothly.

After a longer run cherrypy would start to slow down with static file load (my guess was due to memory usage and threads) - Flask has no such issue. The solving of this problem was a cherry on cake after the migration.

vkuznet commented 6 years ago

Antanas, thanks for clarification. I'm know flask. I only hope you'll bring it before to CMS C&O meeting before the migration. There are few aspects of scalability I concern and even though flask is better in this regard then cherrypy, it is a python based and has the same famous GIL issue. I rather prefer that as collaboration we move to Go-language web development and save time and energy to rewrite our python web based apps. Best, Valentin.

On 0, Antanas Norkus notifications@github.com wrote:

Hi,

we went with flask microframework. We were doing this for any new web service, cherrypy was only used on McM (which is main service for us).

Technically we could go even with bottle as we do not use HTML templates (its same as flask, just without template engine and single file). Cherrypy is thread based which means its slow with parallel unique requests and consumes a lot of memory. In our case cherrypy would increase to 1GB ram within hour (and keep it for forever). Flask didn't consumed more than 900MB in weeks time. Flask is event based.

Flask has explicit routing, which is nice as you always know the correct url. This caused some issues with unified scripts where url was actually malformed and cherrypy would still succeed.

McM service still under-performs due to early design choices (python memory locks), which limits us to single process. In my mind we would go with multiprocess flask (uwsgi app) + ngnix. Although Flask's werkzeug dev webserver is handling usage fine.

For us cherrypy started slowing with heavier load than 30/40 requests/sec. Flask runs smoothly.

After a longer run cherrypy would start to slow down with static file load (my guess was due to memory usage and threads) - Flask has no such issue. The solving of this problem was a cherry on cake after the migration.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cms-PdmV/cmsPdmV/issues/890#issuecomment-349652948

anorkus commented 6 years ago

Well, honestly, people in cmsweb knows my opinion about the proposal to migrate to Go...

If i can improve my service code without the need of language change i would prefer this option. It's not only GIL that hits cmsweb performance, if somebody wrote a code like this: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/REST/Server.py#L192 people might do same with Go or any other language and the services would under-perform all the time.

In general as collaboration, how mane decent Go developers do we have? It's not that all of developers will start coding in Go, maybe some would want to stick with python. I could only image the amateurish mistakes the "new" Go developers would

P.S. McM migration from cherrypy to flask took roughly week with all tests. I can only imagine the time it would take to migrate to Go.

vkuznet commented 6 years ago

You made a point, even with group of python developers we still write non-efficient code. Therefore, even moving to new python framework will not solve that part of the problem and neither solve python related issues.

Regarding time to migrate to Go, I learnt its syntax on my flight to CERN, I migrated DAS to Go roughly in 1 week too (and I barely wrote web service part which is provided by Go standard library). And, once I learnt it I don't think we should write web related stuff in python anymore. It is not write tool to do the job.

On 0, Antanas Norkus notifications@github.com wrote:

Well, honestly, people in cmsweb knows my opinion about the proposal to migrate to Go...

If i can improve my service code without the need of language change i would prefer this option. It's not only GIL that hits cmsweb performance, if somebody wrote a code like this: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/REST/Server.py#L192 people might do same with Go or any other language and the services would under-perform all the time.

In general as collaboration, how mane decent Go developers do we have? It's not that all of developers will start coding in Go, maybe some would want to stick with python. I could only image the amateurish mistakes the "new" Go developers would

P.S. McM migration from cherrypy to flask took roughly week with all tests. I can only imagine the time it would take to migrate to Go.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cms-PdmV/cmsPdmV/issues/890#issuecomment-349670704

anorkus commented 6 years ago

well, you still miss my main point. Everyone makes mistakes, write bad codes (me included). People should go first to fix the code/design instead of trashing the project and deciding to use different language. My previously pointed code piece is only one such example, but nobody even fought that it could cause poor webserver performance. Recently Lina did a test for scalability, numbers for static files queries were so poor and i assume that this piece of "code" was main cause of it.

Your proposal raises more questions than gives answers: Each person is different. If it took you 1 week to learn it doesn't mean everyone will do same. I was developing with python for 5 year and wouldn't say that i know python... If people made bad design choices in making python webservices, it is likely they will make similar mistakes in any other language. Why exactly Golang? Because you learnt it fast? Its already 7years old, there are other languages with newer ideas. Be like most of industry: write webservices in Rust or Nodejs, hell even try kotlin...

My utopian idea would be for cmsweb to support any/multiple programming languages web services (run them in containers etc.).

"I don't think we should write web related stuff in python anymore. It is not write tool to do the job." I totally disagree here: if person made a huge design fault and it doesn't scale, its not the language, its the developer. DBS is python and works great, while DAS web interface is the opposite.

vkuznet commented 6 years ago

Antanas, I think we should stop here. The issue is closed.

But since you asked. Go is statically typed, GC language with built-in concurrency and full stack (including templates) for web development as part of Standard Library. I doubt that industry heading towards Rust/Nodejs, if they target concurrency they choose a different route. And, for the record, containers (docker, kubernetis) are written in Go.

On 0, Antanas Norkus notifications@github.com wrote:

well, you still miss my main point. Everyone makes mistakes, write bad codes (me included). People should go first to fix the code/design instead of trashing the project and deciding to use different language. My previously pointed code piece is only one such example, but nobody even fought that it could cause poor webserver performance. Recently Lina did a test for scalability, numbers for static files queries were so poor and i assume that this piece of "code" was main cause of it.

Your proposal raises more questions than gives answers: Each person is different. If it took you 1 week to learn it doesn't mean everyone will do same. I was developing with python for 5 year and wouldn't say that i know python... If people made bad design choices in making python webservices, it is likely they will make similar mistakes in any other language. Why exactly Golang? Because you learnt it fast? Its already 7years old, there are other languages with newer ideas. Be like most of industry: write webservices in Rust or Nodejs, hell even try kotlin...

My utopian idea would be for cmsweb to support any/multiple programming languages web services (run them in containers etc.).

"I don't think we should write web related stuff in python anymore. It is not write tool to do the job." I totally disagree here: if person made a huge design fault and it doesn't scale, its not the language, its the developer. DBS is python and works great, while DAS web interface is the opposite.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cms-PdmV/cmsPdmV/issues/890#issuecomment-349730162