bitcointranscripts / transcription-review-backend

7 stars 11 forks source link

No validation for transcript's metadata when creating transcript #221

Closed kouloumos closed 10 months ago

kouloumos commented 10 months ago

I observed an issue on the UI that has to do with how transcripts' speakers are parsed and displayed. Looking into it I realize that it arises because of how we validate (or not validate) the metadata that are pushed to the queue.

In this screenshot you can see two transcripts, one shows speakers, the other does not: image

But in the database, both have speakers, but the speakers data are not stored in the same way and that's from where the issue arises:

// The Bitcoin Development Kit
"speakers": [
    "Alekos Filini",
    "Daniela Brozzoni"
  ],
// FROST Implementers Round Table I
"speakers": "['Christopher Allen', 'Jesse Posner', 'Conrado Gouvêa', 'Ian Goldberg', 'Chelsea Komlo', 'Tim Ruffing']",

It seems that the backend is not validating the format of speakers, therefore accepting both types of values. And based on the current content of both staging&production it seems that people have been pushing both formats.

A weird related behavior, is that the "missing" speakers are displayed in the "Editor" and in the "My Account" page, if the user claims that transcript, which probably means that the speakers are parsed differently in those pages: ezgif com-video-to-gif(1)

For now, we can just fix the formats directly in the database and speakers will be displayed correctly. But to solve this, I think that we should:

kouloumos commented 10 months ago

I think that I might have not understood correctly the source of the issue that I am observing on the UI.

These are the current transcript on production: image

Based on what I said above, the transcripts for which you can see the speakers, should have the "correct format" in the database, but this is not true. This is the speakers format for the last one:

// Challenges in implementing coinjoin in hardware wallets - Pavol Rusnak
"speakers": "['Pavol Rusnak']"

But another transcript that has the same format, doesn't display speakers correctly:

// Weighing Transactions, The Witness Discount -  Murch - TABConf 2022
"speakers": "['Mark Erhardt']"

It seems that the issues is a frontend-backend combination.

Extheoisah commented 10 months ago

Good catch. We actually should do some validation. However, it is also expected that tstbtc gives the same structure of data for the metadata. In the case of speakers as you highlighted, tstbtc sometimes formats it as a string instead of an array. The frontend expects an array of speakers and that is why it shows as "N/A" if the backend returns a string. In this light, there are three actionable items:

  1. fix the metadata results in tstbtc so that we always get a consistent metadata structure.
  2. add validation on the backend for metadata to match the expected structure from tstbtc
  3. update the current data in the db so the frontend is able to parse the metadata
kouloumos commented 10 months ago
1. fix the metadata results in tstbtc so that we always get a consistent metadata structure.

Already working on this. tags, categories and speakers will be send as an array. What if there is None? Up until now we were sending speakers: "None". Should we prefer an empty array speakers: []?

Emmanuel-Develops commented 10 months ago

Already working on this. tags, categories and speakers will be send as an array. What if there is None? Up until now we were sending speakers: "None". Should we prefer an empty array speakers: []?

Empty array is a good option.

Regarding why the authors show in the editor but not in the table, I wrote some logic that parses that stringified array to a normal array we can use on the FE, it was mostly a temp solution so it's good we are addressing it now.

Extheoisah commented 10 months ago

1.fix the metadata results in tstbtc so that we always get a consistent metadata structure. 2.add validation on the backend for metadata to match the expected structure from tstbtc 3.update the current data in the db so the frontend is able to parse the metadata

tasks 2 and 3 is complete. moving task 1 to tstbtc repo and closing this