dmwm / DBS

CMS Dataset Bookkeeping Service
Apache License 2.0
7 stars 21 forks source link

Support for dev/int/prod instances in url #84

Closed ericvaandering closed 10 years ago

ericvaandering commented 10 years ago

Original TRAC ticket 2280 reported by giffels Adding support for multiple instances as described in ticket #1437. This item was already discussed in Bristol.

ericvaandering commented 10 years ago

Author: giffels Added support for multiple instances to DBSReaderModel, DBSWriterModel and the DBS configuration. I have already added support for accessing multiple DBs in a single RESTModel to the configuration and the secrets file, since we are going to merge DBSReader and DBSWriter in near future.

ericvaandering commented 10 years ago

Author: giffels The new format of DBSSecrets is now

connectUrl = {'ProductionGlobal' :
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'},
              'DevelopmentGlobal':
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'},
              'IntegrationGlobal':
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'}
              }
databaseOwner = {'ProductionGlobal':
                 {'reader':'owner',
                  'writer':'owner'},
              'DevelopmentGlobal':
                 {'reader':'owner',
                  'writer':'owner'},
              'IntegrationGlobal':
                 {'reader':'owner',
                  'writer':'owner'}
ericvaandering commented 10 years ago

Author: giffels Hi Yuyi,

could you please review and commit the patch to subversion. In addition, I think we should put it into DBS_3_0_12.

The patch has been made against the input-validation branch.

Thanks, Manuel

ericvaandering commented 10 years ago

Author: yuyi Hi, Manuel:

Here is my comment on the patch: 1)please build the patch against the svn trunk instead of input-validation branch as the branch is out of date. 2)under the same instance, both reader and writer should have the exactly same owner. Provide two possible owners in DBSSecrets could be error-prone so I'd have one place to list only one owner for both reader and writer. 3) please test your patch using unit tests before submit.

ericvaandering commented 10 years ago

Author: giffels Replying to [comment:4 yuyi]:

Hi, Manuel:

Here is my comment on the patch: 1)please build the patch against the svn trunk instead of input-validation branch as the branch is out of date.

Done.

2)under the same instance, both reader and writer should have the exactly same owner. Provide two possible owners in DBSSecrets could be error-prone so I'd have one place to list only one owner for both reader and writer.

Done. We have now the following format.

connectUrl = {'ProductionGlobal' :
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'},
              'DevelopmentGlobal':
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'},
              'IntegrationGlobal':
              {'reader':'oracle://owner:secret@instance',
               'writer':'oracle://owner:secret@instance'}
              }
 databaseOwner = {'ProductionGlobal':'owner',
                 'DevelopmentGlobal':'owner,
                 'IntegrationGlobal':'owner'
                }

3) please test your patch using unit tests before submit.

They are not running out of the box in a rpm environment. Do you have an updated setup.sh script for a rpm environment? If not, I am going to write one tomorrow. Therefore, I will upload the updated patch tomorrow.

Manuel

ericvaandering commented 10 years ago

Author: giffels Replying to [comment:4 yuyi]:

3) please test your patch using unit tests before submit.

Done. Unittests needed changes to run with input validation and instances. Now all tests (Server and Client) passed. Could you review the patch, please.

Cheers, Manuel

ericvaandering commented 10 years ago

Author: yuyi Reviewing it now. Please let me know the server URLs created using this patch.

ericvaandering commented 10 years ago

Author: giffels Please, go to https://dbs3-dastestbed.cern.ch/dbs/, all instances are listed there.

ericvaandering commented 10 years ago

Author: valya I have two comments on this:

  1. Does it means that from now on DBS3 will only use HTTPS. Previously I used HTTP. Does authentication via certificate will be required?
  2. The provided instances does not accept pattern for datasets API, e.g. https://localhost:8979/dbs/prod/global/DBSReader/datasets/?dataset_access_type=PRODUCTION&dataset=_Zee_

returns the following error

Invalid Input Data: Not Match Required Format

Is it part of new input validation constraints or pattern schema (requirement of three slashes)? We agreed a long time ago that datasets must support patterns. I provided enough arguments that this specific API needs to support pattern, and usage of 3 slash requirement is not applicable here. The DBS3 instance on vocms09 does work though, see

http://vocms09.cern.ch:8989/dbs/DBSReader/datasets/?dataset_access_type=PRODUCTION&dataset=*Zee*

ericvaandering commented 10 years ago

Author: yuyi Hi, Valentin:

https://dbs3-dastestbed.cern.ch/dbs/ is a VM so it has the same environment as cmsweb. Therefore, it is https and requires cert. Look like you already have that setup since you did get responding from the server.

Dataset API and most other APIs accept patterns. For dataset, block a leading "/" is required, but not three "/". (This is subject to review by Simon.) A list of supported dataset and block patterns can be found at https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/test/python/WMCore_t/Lexicon_t.py.

ericvaandering commented 10 years ago

Author: yuyi Hi, Manuel:

I reviewed your patch. Two comments: 1.".../tests/dbsclient_t/validation/DBSValitaion_t.py" is completed removed, then a new file ".../dbsclient_t/validation/DBSValidation_t.py" created, why? This make reviewer have a hard time to compare what was changed.

2.On the server side test, You added "Server/Python/tests/run_tests.sh" to run under rpm. This is good. But you completed removed "Server/Python/tests/setup.sh". This is a component test and should be able to run without rpm. It is good we can run under both environments. Can you please recover setup.py?

ericvaandering commented 10 years ago

Author: valya Replying to [comment:11 yuyi]:

Dataset API and most other APIs accept patterns. For dataset, block a leading "/" is required, but not three "/". (This is subject to review by Simon.) A list of supported dataset and block patterns can be found at https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/test/python/WMCore_t/Lexicon_t.py.

Don't take me wrong but I don't understand how extra slash constrain in pattern makes any difference. Since all datasets/blocks starts with slash it is really unnecessary in patterns since you allow wild-card right after the slash. I rather prefer to see a real argument why it is required or better an example which shows that I can get different set of datasets using /* query instead of *.

ericvaandering commented 10 years ago

Author: giffels Replying to [comment:12 yuyi]:

Hi, Manuel:

I reviewed your patch. Two comments: 1.".../tests/dbsclient_t/validation/DBSValitaion_t.py" is completed removed, then a new file ".../dbsclient_t/validation/DBSValidation_t.py" created, why? This make reviewer have a hard time to compare what was changed.

I used git mv to move the file, since there was a typo in the file name I'd like to fix. I thought that git is able to track such changes easily. Sorry, for the inconvience.

2.On the server side test, You added "Server/Python/tests/run_tests.sh" to run under rpm. This is good. But you completed removed "Server/Python/tests/setup.sh". This is a component test and should be able to run without rpm. It is good we can run under both environments. Can you please recover setup.py?

I used git mv to move the file and to give it a more reasonable name. I think we should keep the repository clean, therefore I suggest not to recover setup.sh. I'd suggest to add component tests to run_test.sh in a way, that they can run without rpm installation. Is that a acceptable solution? But anyhow we should not develop in a non rpm environment.

ericvaandering commented 10 years ago

Author: yuyi Replying to [comment:13 valya]:

Don't take me wrong but I don't understand how extra slash constrain in pattern makes any difference. Since all datasets/blocks starts with slash it is really unnecessary in patterns since you allow wild-card right after the slash. I rather prefer to see a real argument why it is required or better an example which shows that I can get different set of datasets using /* query instead of *.

Valentin:

Simon is currently reviewing the input validation. You may want to direct your concern to him. Changing a leading "/" in dataset/block search is NOT an issues from database side. People are arguing about how to make our input validation to max the protection to the server from possible attack.

ericvaandering commented 10 years ago

Author: metson I think the thing to do is make searchdataset/searchblock accept * or ///* type of queries. Adding that isn't hard. Then queries are either "everything" (e.g. *) or are constrained to a dataset/block pattern, as opposed to the very loose regexp that was there before.

ericvaandering commented 10 years ago

Author: giffels Replying to [comment:14 giffels]:

Replying to [comment:12 yuyi]:

2.On the server side test, You added "Server/Python/tests/run_tests.sh" to run under rpm. This is good. But you completed removed "Server/Python/tests/setup.sh". This is a component test and should be able to run without rpm. It is good we can run under both environments. Can you please recover setup.py?

I used git mv to move the file and to give it a more reasonable name. I think we should keep the repository clean, therefore I suggest not to recover setup.sh. I'd suggest to add component tests to run_test.sh in a way, that they can run without rpm installation. Is that a acceptable solution? But anyhow we should not develop in a non rpm environment.

I have now added a setup_test.py, which is based on distutils, because it is mightier . It is possible to run all component tests as long as the environment is set-up correctly. For rpm based installations this is done by

source /data/current/apps/dbs/etc/profile.d/init.sh

In addition, you need to specify a valid configuration file. To run all web-layer tests just use

python setup_test.py test_system --config=/data/current/config/dbs/DBS.py --weball

You can see all available option by using

python setup_test.py test_system --config=/data/current/config/dbs/DBS.py

Available options are:

--web=DBSReader or --web=DBSWriter to run unittests on the web-layer
--weball to run reader and writer unittests on the web-layer
--dao to run unittests on the dao-layer
--business to run unittests on the business-layer
--allTest to run unittests on all sub-layers

Could you review and commit the patch, please?

Cheers, Manuel

ericvaandering commented 10 years ago

Author: valya Replying to [comment:16 metson]:

I think the thing to do is make searchdataset/searchblock accept * or ///* type of queries. Adding that isn't hard. Then queries are either "everything" (e.g. *) or are constrained to a dataset/block pattern, as opposed to the very loose regexp that was there before.

Simon, I still don't get the argument. If you allow everything what's the point of restrict to specific pattern with three slashes, how about these examples: /_/_QCD* or QCD/RECO. Could you please provide THE argument behind requiring three slashes. As far as I understood that you already agreed to query everything.

ericvaandering commented 10 years ago

Author: metson I would prefer not to allow a * query - what's the use case once DAS is in place? You could imagine a "loose" query returning a small amount of data (just the names, for instance) and a tighter query returning more detailed information (sizes, number of files etc).

We should be restricting search as much as possible to limit db load and speed up response. Some searches are just lazy and, as you often point out, we're not Google. I personally don't see a problem in requiring that people learn to follow the pattern, especially for a tool that's not user facing. That does make DAS more complicated, and we need to tension those two products.

PS. this is totally off topic for this ticket. Can we carry on if necessary elsewhere? Suggest a new ticket.

ericvaandering commented 10 years ago

Author: valya Replying to [comment:19 metson]:

I would prefer not to allow a * query - what's the use case once DAS is in place? You could imagine a "loose" query returning a small amount of data (just the names, for instance) and a tighter query returning more detailed information (sizes, number of files etc).

We should be restricting search as much as possible to limit db load and speed up response. Some searches are just lazy and, as you often point out, we're not Google. I personally don't see a problem in requiring that people learn to follow the pattern, especially for a tool that's not user facing. That does make DAS more complicated, and we need to tension those two products.

Originally this question raised in #1947. Feel free to go back to this ticket or create a new one. Meanwhile here is my replies to your question.

what's the use case once DAS is in place? Start from user point of view. We claim that to discover data we need first to discover datasets. User knows some pattern and is willing to find data associated with it. Let's make it pragmatic and claim that user searches Top samples. How will you search for it using three slash restriction? Top or /Top//, /_/Top/_, etc.? Are you willing to explain this restriction to end-users? Are you asking DAS to perform all possible permutation of the pattern and slashes? What if user willing to specify a couple of pattens in a search field, e.g. Foo and RE? The slashes will lead to unnecessary combinatorics. I don't think such restriction is appealing to end-users and worth to coding at app level. Also I don't think that ORACLE can't efficiently handle thousands, even million rows which are properly indexed. I don't see ANY benefits of using slash requirement in patterns, since all datasets starts with slash. From your description I still don't see any real argument to impose such restriction.

I'm not asking for wild-pattern on every DBS table/API, but for dataset API it seems more then appropriate without unnecessary complexity at app level or user UI.

ericvaandering commented 10 years ago

Author: hufnagel I think requiring the users to have some idea of what they are looking for is not too much to ask for, is it ? Do we really want to allow wildcard searches across Primary Dataset and Processed Dataset at the same time ?

IMHO, searching for datasets, you can use wildcards, but you need to specify the dataset syntax and specify where you want the wildcards. So Top throws an error. If they want Top, they can run the two searches /Top//Tier and //Top/Tier (ignoring he tier here because I don't see the use case for wildcards there at all). They are different searches for different things.

ericvaandering commented 10 years ago

Author: metson Can we please move discussion of the query language to another ticket. It's totally off topic here.

ericvaandering commented 10 years ago

Author: yuyi I created a new ticket(#2326) for discussing the search patterns. Please move related discussion there.

ericvaandering commented 10 years ago

Author: yuyi Hi, Manuel:

Unfortunately, The unit tests could not be run even under rpm installed environment. (I did not try on development one since the setup_test.py does not set proper environment for testing. This has to be fixed later.)

Here is what I did:

cd /uscms/home/yuyi/dbs3rpm #the root of rpm installation Source current/apps/dbs/etc/profile.d/init.sh cd /uscms/home/yuyi/dbs3-test/DBS/Server/Python/tests python setup_test.py test_system --config=/data/current/config/dbs/DBS.py --web=DBSWriter

running test_system module_names = ['dbsserver_t.unittests.web_t.DBSWriterModel_t'] Traceback (most recent call last): File "setup_test.py", line 120, in cmdclass = { 'test_system': TestCommand}) File "/uscms/home/yuyi/dbs3rpm/deploymenttest-3011b/sw.pre/slc5_amd64_gcc434/external/python/2.6.4-comp6/lib/python2.6/distutils/core.py", line 152, in setup dist.run_commands() File "/uscms/home/yuyi/dbs3rpm/deploymenttest-3011b/sw.pre/slc5_amd64_gcc434/external/python/2.6.4-comp6/lib/python2.6/distutils/dist.py", line 975, in run_commands self.run_command(cmd) File "/uscms/home/yuyi/dbs3rpm/deploymenttest-3011b/sw.pre/slc5_amd64_gcc434/external/python/2.6.4-comp6/lib/python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "setup_test.py", line 96, in run loadedTests = unittest.TestLoader().loadTestsFromNames(module_names) File "/uscms/home/yuyi/dbs3rpm/deploymenttest-3011b/sw.pre/slc5_amd64_gcc434/external/python/2.6.4-comp6/lib/python2.6/unittest.py", line 615, in loadTestsFromNames suites = [self.loadTestsFromName(name, module) for name in names] File "/uscms/home/yuyi/dbs3rpm/deploymenttest-3011b/sw.pre/slc5_amd64_gcc434/external/python/2.6.4-comp6/lib/python2.6/unittest.py", line 586, in loadTestsFromName parent, obj = obj, getattr(obj, part) AttributeError: 'module' object has no attribute 'DBSWriterModel_t'

Any ideas?

Thanks, yuyi

ericvaandering commented 10 years ago

Author: yuyi Hi, Manuel:

I got the writer test run. It was an error from copy/paste for the configure file. I used --config=/data/current/config/dbs/DBS.py, but I don't have it. I don't understand why it did not report an error like "configuration file not found", but an error on DBSWriterModule_t. It is very misleading. ideas?

I need to apply the patch you gave me this morning to run the reader test. Hope it is all fine.

Cheers, yuyi

ericvaandering commented 10 years ago

Author: giffels Hi Yuyi,

I have attached the config also to this ticket. I will add a few lines of code to check if the file exists. Sorry, for that.

Cheers, Manuel

ericvaandering commented 10 years ago

Author: giffels I have added a new version that checks the existances of the config file. The patch to the reader unittests is also include in the tarball.

Cheers, Manuel

ericvaandering commented 10 years ago

Author: yuyi Hi, Manuel:

Both read and write tests are working now. But we still need some work to get it run properly. I am working on to make the no-rpm env working. Can you fix the info.dict file? Currently it is generated in Server/Python/tests dir. The correct place to have this file is in the sub-dir of each test. For example, DBSWriterModel_t.py should generates the info file in web_t. So when there are multiple tests running, their info files will not be mixed up.

In addition, when you generate a new patch, can you only include the new fix, not everything that already include in the previous patches? It helps me to apply the new patch on top of the old patches.

Cheers, yuyi

ericvaandering commented 10 years ago

Author: giffels Replying to [comment:28 yuyi]:

Hi, Manuel:

Both read and write tests are working now. But we still need some work to get it run properly. I am working on to make the no-rpm env working. Can you fix the info.dict file? Currently it is generated in Server/Python/tests dir. The correct place to have this file is in the sub-dir of each test. For example, DBSWriterModel_t.py should generates the info file in web_t. So when there are multiple tests running, their info files will not be mixed up.

web-layer unittests are now generating the info.dict in the web_t directory. Actually, that was not the case before writing setup_test.py, therefore I did not force it.

What are you planning to do on the setup_test.py? In principle, non rpm environments are supported. You just need to setup DBS3_SERVER_ROOT manually and set the PYTHONPATH correctly.

Cheers, Manuel

ericvaandering commented 10 years ago

Author: yuyi Replying to [comment:29 giffels]:

Replying to [comment:28 yuyi]:

web-layer unittests are now generating the info.dict in the web_t directory. Actually, that was not the case before writing setup_test.py, therefore I did not force it.

Good. Will apply the patch. In the old code, the web layer tests were run in the web_t dir so it generated the info file there by default. When you move the running tests up to tests dir, it will generate the file in tests by default. So you need to force it to the correct place to put the file.

What are you planning to do on the setup_test.py? In principle, non rpm environments are supported. You just need to setup DBS3_SERVER_ROOT manually and set the PYTHONPATH correctly.

""" if not os.environ.get("DBS3_SERVER_ROOT"): if not os.environ.get("DBS3_ROOT"): print """You have to source init.sh before running unittests\n If you are using rpm based development environment on a VM,\n try to source /data/current/apps/dbs/etc/profile.d/init.sh.\n If your are using development environment, source Server/Python/control/setup.sh.\n It will point to the base directory of your DBS3 installation to $DBS3_ROOT and that PYTHON_PATH\n is set-up correctly.""" sys.exit(1) else: os.environ['DBS3_SERVER_ROOT'] = os.environ['DBS3_ROOT'] """

Looks like we are close to commit it.

Cheers, Yuyi

Cheers, Manuel

ericvaandering commented 10 years ago

Author: giffels Okay, go ahead.

Cheers, Manuel

ericvaandering commented 10 years ago

Author: yuyi Done. Commit to SVN.