GALAglobal / TAPICC-API-implementation

TAPICC API implementation using node.js framework sails.js
Other
6 stars 1 forks source link

Setting up a Job vs. Creating a Job #46

Open ysavourel opened 6 years ago

ysavourel commented 6 years ago

I think we may still need that /jobs/{jobId}/submit command we had a while back.

Creating a job is one POST call, but then we need to upload all the assets for that job and then create all the tasks for each assets. We will likely need a way to indicate to the TMS-side that we are done with adding assets and tasks for a given job. A bit like staging the job and then committing it.

Otherwise the TMS-side may have hard time organizing its own structure. For example, the TMS may need to create one separate project for each set of files with the same source language. Another case may be if the TMS needs to make groups files per language-pairs. Etc.

In other words, the TMS may need to known about the whole job before it can actually start creating its internal structures.

The steps of setting up a job would be something like this:

POST /jobs (get back the job ID)
for each asset in the job:
    POST /jobs/{jobId}/assets/uploadFile (get back the asset ID)
    for each target language for that asset:
        POST /assets/{assetId}/tasks (get back the task ID)
POST /jobs/{jobId}/submit (to commit the job)

This also means that we may need some value indicating that the progress of the tasks is not even started yet (while the job setup is being done). Something maybe like new or not-submitted-yet.

Alino commented 6 years ago

so visiting GET /jobs/{jobId}/submitwould change some property of a Job to mark it as submitted? Would that be locking the Job, so it would not be possible to modify it anymore and not possible to create/update Assets in it? Or can the Job be resubmit with the same route?

Is the TMS supposed to recreate the structure of a TAPICC Job, after visiting this route? Do you have an idea how would that work? Via a webhook?

ysavourel commented 6 years ago

so visiting GET /jobs/{jobId}/submitwould change some property of a Job to mark it as submitted? Would that be locking the Job, so it would not be possible to modify it anymore and not possible to create/update Assets in it? Or can the Job be resubmit with the same route?

I’m not sure if we would need a job-level flag. Having the status of each task set to a value indicating the job is not submitted yet might be enough. As for updates, I guess that goes back to issue #41. I don’t think how we create the job itself affects how you can update it. But it’s a good point: we have not really discussed how updates (or lack thereof) would be done.

Is the TMS supposed to recreate the structure of a TAPICC Job, after visiting this route? Do you have an idea how would that work? Via a webhook?

We do have two structures: the one of TAPICC and the one of the TMS. There is very few chances that they match completely. So most likely we have to keep track of the TAPICC data (at least some of them) separately from the TMS structure. But from the viewpoint of the TAPICC client that should be transparent: For example, the call to get the status of a translation task for a given file for a given target is simply routed to whatever structure the TMS uses. I don’t think you need to re-convert to TAPICC, or use webhooks for all that.

This is where we really need the input of developers who would implement TAPICC with their systems: We can’t imagine all the ways things could be done. At least I can’t: my experience of connecting CMS and TMS is limited to a few systems on each side and likely rather small compare to all the systems out there.

Maybe an example would help here:

In Argos one of the TMSes we use works the following way: The top unit is a “project”. A given project can have one or more “batch”. A batch is a set of identical source files with a single source language and one or more targets, and going through one given workflow. We can construct a batch is several steps (file by file if needed, add targets, etc.) but after that all processes are usually done for all files at once or at least by language pairs.

So, let’s say we get a TAPICC job like this:

We have to re-structure the 7 tasks of the TAPICC data into something like this:

So we would associate each of the items in the batches with a task, and just work through that link as needed.

Alino commented 6 years ago

Sorry, I feel lost here, can you please answer my questions so I can see it cleaner?

  1. What exactly should be the function of this endpoint? /job/{jobId}/submit
  2. What should trigger the migration of Job data from TAPICC to the TMS?

I’m not sure if we would need a job-level flag. Having the status of each task set to a value indicating the job is not submitted yet might be enough.

ysavourel commented 6 years ago

What exactly should be the function of this endpoint? /job/{jobId}/submit

It is send by the creator of the job, when all the job’s components (assets and tasks) have been uploaded and created. POST /jobs starts the process of creating the job, and POST /jobs/{jobId}/submit concludes it.

It would work roughly this way:

What should trigger the migration of Job data from TAPICC to the TMS?

If by migration you mean whatever process needs to happened to make the TAPICC job’s data a “real” job for the TMS, the answer is: the submit call. If some systems do not need such trigger and can handle the bits and pieces of the new job as they arrive, good for them. They can just ignore the submit call then.

I’m not sure if we would need a job-level flag. Having the status of each task set to a value indicating the job is not submitted yet might be enough.

Are you talking about an already existing Task.progress attribute, or is it some new Task.status attribute which should be created? What is the benefit of having this value on Task level, instead of Job level?

I’m talking about the Task.progress: it would probably need a value to indicate that a task is not yet “ready” to be processed (maybe new) and then the that progress field would be set to pending when the server receives the submit call.

I only mentioned a possible status at the job level because we don’t have currently a direct way to know a job has been submitted or not (the only way would be to look at the task.progress values, and/or whether it has any assets and tasks at all).

I hope this helps.

Alino commented 6 years ago

Thanks, I believe it helped me to understand better.

If by migration you mean whatever process needs to happened to make the TAPICC job’s data a “real” job for the TMS, the answer is: the submit call. If some systems do not need such trigger and can handle the bits and pieces of the new job as they arrive, good for them. They can just ignore the submit call then.

I’m talking about the Task.progress: it would probably need a value to indicate that a task is not yet “ready” to be processed (maybe new) and then the that progress field would be set to pending when the server receives the submit call. I only mentioned a possible status at the job level because we don’t have currently a direct way to know a job has been submitted or not (the only way would be to look at the task.progress values, and/or whether it has any assets and tasks at all).

What if we ignore Task.progress in this matter, and we would create a new attribute Job.submittedAt which would be a date-time. (We had this property, but I deleted it, because I though it was the same thing as createdAt.) But from now, this property would be filled with date-time after the job is submit with the api endpoint.

Then we can create another boolean attribute called Job.changedSinceLastSubmit This would be set to false by default, but as soon as there is a modification done to the Job object, or any associated object such as Asset or Task (creation, deletion, modification), it would be set to true. This way we would be aware if something has changed in the Job data, since it was last submitted.

ysavourel commented 6 years ago

So, we will probably need to have a webhook event associated with job submit, let's call it "JobSubmitted" event. When the job submit endpoint is opened, it should trigger this webhook so that the TMS can obtain all Job data, including Assets and Tasks. So that it can recreate this structure in itself, as it needs. (The TMS would have to be compatible with TAPICC, in such a way it would need to be able to create the structure from the webhook)

I'm afraid I'm not sure I understand the webhook purpose. To me a webhook is a callback URL that a client of a TAPICC server sets in the TAPICC server, so when an event occurs, that client is notified. So, in the scenario of a CMS (the client) creating a job in the TMS (the TAPICC server), I don't understand why the submit call would trigger a webhook for the TMS. The TAPICC server doesn't need to have webhook for itself. It can just act when it receive the submit... Or I'm missing something...

What if we ignore Task.progress in this matter, and we would create a new attribute Job.submittedAt which would be a date-time. (We had this property, but I deleted it, because I though it was the same thing as createdAt.) But from now, this property would be filled with date-time after the job is submit with the api endpoint.

A Job.submittedAt would be fine. But I'm not sure this can replace a "new"-like value for Task.progress. Not having a distinct indicator at the task level for tasks not ready yet would make detecting the "status" of the task complicated (one would need to also access the job).

Then we can create another boolean attribute called Job.changedSinceLastSubmit

I guess that goes back to the discussion about how to do updates.

Alino commented 6 years ago

I'm afraid I'm not sure I understand the webhook purpose. To me a webhook is a callback URL that a client of a TAPICC server sets in the TAPICC server, so when an event occurs, that client is notified. So, in the scenario of a CMS (the client) creating a job in the TMS (the TAPICC server), I don't understand why the submit call would trigger a webhook for the TMS. The TAPICC server doesn't need to have webhook for itself. It can just act when it receive the submit... Or I'm missing something...

The idea was, that the TAPICC would send a webhook to the TMS (or some middleware between TAPICC and TMS for example zapier.com) so that the TMS can recreate the structure in it's database or do whatever it wants, by acting on the webhook (the TAPICC webhook would send all required data to the TMS). But maybe it's bad idea and we shouldn't expect the TMS systems to do this extra effort of supporting this TAPICC webhook. But rather a specific TAPICC implementation should adapt to the API of the TMS to recreate the structure in the TMS.

In other words, I think there are 3 options how the TMS gets the structure from TAPICC created:

  1. TAPICC webhook sent to middleware something like zapier.com
  2. TAPICC webhook sent directly to TMS
  3. TAPICC implementation uses the API of the TMS

A Job.submittedAt would be fine. But I'm not sure this can replace a "new"-like value for Task.progress. Not having a distinct indicator at the task level for tasks not ready yet would make detecting the "status" of the task complicated (one would need to also access the job).

Yes, it's true that one would need to also access the Job. But what about Assets? Do we need this information also on Assets or only on Tasks? For me it seems easier to reason about all of these objects if they were submitted, if their parent object (Job) has the indicator if it was submitted. In what scenario does someone need to know if the Job of a Task has been submitted?

Alino commented 6 years ago

Sorry I am tired, I think I missed that part where you say TAPICC is the TMS. I though we needed to send job data from TAPICC server to some other server (TMS) which might have different data structure (the batches example you used before).

ysavourel commented 6 years ago

I see. then yes, what you said would have made sense. I guess we are back on the same page then.

Alino commented 6 years ago

In your very first post, does TMS mean TAPICC or something else? Probably I am also confused with our terminology, I though TAPICC is not a TMS if I remember correctly from our last group call, but a bridge between TMS or/and CMS?

ysavourel commented 6 years ago

TMS means a normal TMS, but it includes also a TAPICC server component. How exactly are they working together is up to the implementer.

So, yes, TAPICC is a "bridge" between the CMS and the TMS in the sense that the CMS is the TAPICC-client while the TMS includes the TAPICC-server.

mesztam commented 5 years ago

I got lost a bit, so do we need a /submit action or e.g. a specific job state or anything else to trigger the TMS to start the translation job? Or the job is started as soon as the child tasks/assets are fully set up? (How do we indicate if a task is ready to start?) Or this should be fully async, and the different tasks start as soon as their input is in place, while others are still being set up? (In some TMS systems' APIs this is not possible, so you cannot extend an existing project with new inputs once the project is started)

ysavourel commented 5 years ago

Since I'm starting back to look at doing a TAPICC implementation, I'll try to followup on this, which is still not resolved as far as I understand.

The /submit would let know the TAPICC-server (where the job is created) that there is no more input or tasks to be associated with the job being created. So, yes, in a sense it is a change of status for the job. Note that the client would send that call only when it received back the results for all the tasks and input creation call it sent.

Alino commented 5 years ago

from today's meeting

It looks like we do not need that "submission is done" call, because a Job can be open forever.

ysavourel commented 4 years ago

I have to re-open this issue because I found a case where knowing at the task level that the task is ready is not enough:

This is a fairly typical project, even probably the main use case for us.

Then our internal system would very much like to treat those three tasks within a single project. It makes no sense for us to have three separate projects for this.

The problem: How do we know the client is done with submitting tasks?

jcompton-moravia commented 4 years ago

I'm attaching a discussion thread that took place on Skype between Yves, Alex, Wojciech and myself where the conclusion of the discussion seems to be: Let's add a property to the Job object that would indicate if a Job is an "open" or "sealed" package. The "work protocol"/use case/model we're trying to support is the "classic localization project" where none of the tasks within the job should be executed until all tasks are present. This is opposed to the "continuous localization" model where tasks should be executed as soon as they are assigned, with no dependency on any other task. The "open/sealed" analogy treats the Job like a package. If it is "open," then any number of tasks can be added to it for as long as it remains open. If it is "sealed" then whatever tasks are in there are the tasks that represent the Job. Skype Transcript about Jobs.txt

ysavourel commented 4 years ago

Looking more at this issue, I'm not sure a new property with open/sealed value at the job level corresponds exactly to what we would need. It seems to assume the job cannot be changed anymore. But in reality we are just trying to set an trigger to a batch of posts. Early on I believe Alex was proposing a specific endpoint for the given job: when called it means the submitter is done submitting and the host can start the tasks (if it needed to wait for that signal). Hosts that can start each task as they come can simple ignore that request. This would allow the requester to add tasks later on if needed. It doesn't "close" the job. The issue is more about a process than a state.