dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

migrationSubmit reports back the wrong ID #66

Closed belforte closed 2 years ago

belforte commented 2 years ago

I submitted a migration request for one block of an AOD dataset I wanted to test a dataset high in the chain with liimited parentage (RAW only).

So I asked for: {'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader', 'migration_input': '/PAMinimumBias1/PARun2016C-PromptReco-v1/AOD#ae9a22f0-b51a-11e6-91fa-001e67abf228'}

and got the following result which refers to at block in the RAW dataset !!

== MIGRATION SUBMISSION RESULT ===
{'error': None,
 'migration_details': {'create_by': '/DC=org/DC=terena/DC=tcs/C=IT/O=Istituto '
                                    'Nazionale di Fisica Nucleare/CN=Stefano '
                                    'Belforte belforte@infn.it',
                       'creation_date': 1657813866,
                       'last_modification_date': 1657813866,
                       'last_modified_by': '/DC=org/DC=terena/DC=tcs/C=IT/O=Istituto '
                                           'Nazionale di Fisica '
                                           'Nucleare/CN=Stefano Belforte '
                                           'belforte@infn.it',
                       'migration_input': '/PAMinimumBias1/PARun2016C-v1/RAW#ea0d89b4-b2d4-11e6-91fa-001e67abf228',
                       'migration_request_id': 1786546,
                       'migration_status': 0,
                       'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
                       'retry_count': 0},
 'migration_report': 'Migration request is started',
 'status': 'PENDING'}
belforte commented 2 years ago

To be clear: I have submitted requests to migrate 7 blocks in that AOD dataset, one at a time. Eventually those blocks got all migrated but in a few cases I got back the Id for the parent RAW block. Most cases where OK.

vkuznet commented 2 years ago

Stefano, I think it is an issue with your codebase. In another ticket you provided me your code and here is what you do:

if isinstance(result, list):
    result=result[0]
    status = result['migration_details']['migration_status']

In other words, you pickup first item from the return list which (since we migrate first parents) returns you a RAW dataset migration requests.

When you submit a request which in turns needs to migrate other datasets then you will get list of all migration requests which will include your dataset and all its parents. For instance, if I use curl for your dataset here is an actual output:

[
  {
    "migration_details": {
      "migration_request_id": 1,
      "migration_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader",
      "migration_input": "/PAMinimumBias1/PARun2016C-v1/RAW#78fd15b2-b227-11e6-91fa-001e67abf228",
      "migration_status": 0,
      "create_by": "DBS-workflow",
      "creation_date": 1658236605,
      "last_modified_by": "DBS-workflow",
      "last_modification_date": 1658236605,
      "retry_count": 0
    },
    "migration_report": "Migration request is started",
    "status": "PENDING",
    "error": null
  },
  {
    "migration_details": {
      "migration_request_id": 2,
      "migration_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader",
      "migration_input": "/PAMinimumBias1/PARun2016C-v1/RAW#ea0d89b4-b2d4-11e6-91fa-001e67abf228",
      "migration_status": 0,
      "create_by": "DBS-workflow",
      "creation_date": 1658236605,
      "last_modified_by": "DBS-workflow",
      "last_modification_date": 1658236605,
      "retry_count": 0
    },
    "migration_report": "Migration request is started",
    "status": "PENDING",
    "error": null
  },
  {
    "migration_details": {
      "migration_request_id": 3,
      "migration_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader",
      "migration_input": "/PAMinimumBias1/PARun2016C-v1/RAW#54690bec-b2f5-11e6-91fa-001e67abf228",
      "migration_status": 0,
      "create_by": "DBS-workflow",
      "creation_date": 1658236605,
      "last_modified_by": "DBS-workflow",
      "last_modification_date": 1658236605,
      "retry_count": 0
    },
    "migration_report": "Migration request is started",
    "status": "PENDING",
    "error": null
  },
  {
    "migration_details": {
      "migration_request_id": 4,
      "migration_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader",
      "migration_input": "/PAMinimumBias1/PARun2016C-v1/RAW#650f2d5e-b30f-11e6-91fa-001e67abf228",
      "migration_status": 0,
      "create_by": "DBS-workflow",
      "creation_date": 1658236605,
      "last_modified_by": "DBS-workflow",
      "last_modification_date": 1658236605,
      "retry_count": 0
    },
    "migration_report": "Migration request is started",
    "status": "PENDING",
    "error": null
  },
  {
    "migration_details": {
      "migration_request_id": 5,
      "migration_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader",
      "migration_input": "/PAMinimumBias1/PARun2016C-PromptReco-v1/AOD#ae9a22f0-b51a-11e6-91fa-001e67abf228",
      "migration_status": 0,
      "create_by": "DBS-workflow",
      "creation_date": 1658236605,
      "last_modified_by": "DBS-workflow",
      "last_modification_date": 1658236605,
      "retry_count": 0
    },
    "migration_report": "Migration request is started",
    "status": "PENDING",
    "error": null
  }
]

Please double check your code and close this ticket since I don't see anything wrong with DBS server, and yes, the first item in return list is RAW block migration request.

belforte commented 2 years ago

I beg to disagree. I am aware of the fact that the server can return a long list here (and I do not like that, but that's another discussion which we still need to have), but in this case I verified that the list was one element long. As I noted, simpy repeating same call on different blocks of the dataset at times I got one result at times another.

That said, it was late night and mistakes get made. I will double check. But I also think that you should use cmsweb-testbed too in your testing. Behavior could be different if nothing else because testbed is still also getting some migration load from CRAB.

belforte commented 2 years ago

BTW.. yeha.. this case aside, I was naively expecting the first element in the list to be the block I asked for. I "discovered" that the output can be a very long list, but this is a big change from previous python-based server which you never mentioned and never documented ! I do not like having to look for the migrationId to pick among all that noise. Why do you tell me ? Why should I care ? I ask migration server to migrate one block, I only care to know when that block is available. If you need to migrate tons of parents, it is your problem, you need to track that in migration server side (that's why we have a migration server) and just tell me when all is done. Do you expect my client to track all those migrations and make sure they all completed and/or issue retries etc. etc. ? Sometimes DB's become unresponsive and things go bad, there will be errors, and retries will be needed, who should do them ? If you say "CRAB", it means CRAB needs to run its own migration server, at that point I may very well lookup ancestors myself and do one parentage layer at a time so I have control of the process, or I may very well decide that this is too much to add to my load and CRAB will simply publish w/o parentage information.

vkuznet commented 2 years ago

Stefano, I try to address both sides. On a server we need to know (when we debug its issues) what are the entire chain of migration is, while on client we care about initial request. But if migration fail you need both info to debug the issue. As such I think what is missing is change for status API to provide status about your concrete input. So far the status API only accepts migration request id. Since single migration request (e.g. a dataset) can cause chain of migration requests on a server, and moreover a dataset migration always resolve into migration of individual blocks, we can't use id as identified of migration. What I think is missing is the check by given migration input. I'll be happy to provide that. Here is how I envision status API should behave:

Please review and provide me your feedback and I'll adjust the code accordingly.

vkuznet commented 2 years ago

I checked the code and in fact everything is in place. The status API can do the following:

So, can you use /status?migration_input=<your_input> API call to check status of concrete block/dataset? This will be your initial migration input and it will not include all intermediate parents.

vkuznet commented 2 years ago

I updated documentation examples to show how to use status API with migration_input, e.g. /status?migration_input=<>

belforte commented 2 years ago

OK I can give it a try. Thanks. That should avoid the need for extensive code changes on my side. SOrry for being slow on following up on this. I really need to sped a lot of time with my family atm.

belforte commented 2 years ago

migration_input or block_name ? The DBSClient API requires the latter[1] (compatibility with python server, I guess) but it looks that when it passes it to server I get an error about "BLOCK_NAME": invalid identifier [2]

[1] https://github.com/dmwm/DBSClient/blob/be292dca5b82f72a8cf9bf13e7dc263232ab7b16/src/python/dbs/apis/dbsClient.py#L1815-L1830

[2] DBS Server error: [{'error': {'reason': 'DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.executeAll Message: Error: ORA-00904: "MR"."BLOCK_NAME": invalid identifier\n', 'message': '', 'function': 'dbs.migrate.StatusMigration', 'code': 103}, 'http': {'method': 'GET', 'code': 400, 'timestamp': '2022-07-19 20:44:26.436756879 +0000 UTC m=+469619.048281327', 'path': '/dbs/int/phys03/DBSMigrate/status?block_name=/PAMinimumBias1/PARun2016C-PromptReco-v1/AOD#e680ed20-b30d-11e6-91fa-001e67abf228', 'user_agent': 'DBSClient/Unknown/', 'x_forwarded_host': 'cmsweb-testbed.cern.ch:8443', 'x_forwarded_for': '188.185.15.220', 'remote_addr': '188.185.121.219:44971'}, 'exception': 400, 'type': 'HTTPError', 'message': 'DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.migrate.StatusMigration Message: Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.executeAll Message: Error: ORA-00904: "MR"."BLOCK_NAME": invalid identifier\n'}]

vkuznet commented 2 years ago

I was testing migration_input, and it works. But I'll check later block_name too. I think migration_input is better since it can be used for block and datasets values though.

belforte commented 2 years ago

i think migration_input works if sent via curl but python DBClient only accepts block_name now. I agree that being more general helps, but if a change in DBSClient is not required I can proceed more rapidly, since for DBS python client I need a new spec file in common COMP build: https://github.com/cms-sw/cmsdist/blob/comp_gcc630/py3-dbs3-client.spec

vkuznet commented 2 years ago

I'll make block_name working tomorrow. But in order to deploy new server I need schema change since I added parallelization and I'm awaiting this task to be done from DBA, therefore deployment of new DBS server is pending until schema will be altered.

belforte commented 2 years ago

no problem, I have a couple other pressing things to do

vkuznet commented 2 years ago

Stefano, I fixed status API to use block_name argument and deploy new version to testbed, e.g. you can do now https://cmsweb-testbed.cern.ch/..../status?block_name=<block>. We also clarified the current output which returns all intermediate parents for migration input, and final element in this list represent your initial migration input. Please let me know if anything needs to be done here, otherwise let's close this issue.

vkuznet commented 2 years ago

I also updated apis.md and MigrationServer.md documentation about status parameters, the former lists what is allowed, the later provide concrete examples.

belforte commented 2 years ago

thanks ! I still have to write new code, sorry. Am a snail trying to keep up with your cheetah speed. But let\s close. If I find problems, I will open new issues. I need to write a paper review tomorrow...