Open rdenaux opened 4 years ago
I have my thoughts against trying to store so much metadata in the users FE, that will heavily impact the performance of the plugin (both in terms of memory and communications times). Take in mind that the actual performance is already not the ideal, although better that in the beginning. Would it not be enough to post the actual final credibility label that is shown to the user. At least for the first versions I would minimize both the received and posted data from the FE plugin side.
The content collector (CC) will specify an enum for credibility values. It'd be nice to align this with the GW and FE, otherwise the GW will have to map between FE and CC enums.
The user accuracy labels (NOT credibility labels) Enum used in FE is:
[ accurate, accurate_with_considerations, unsubstantiated, inaccurate_with_considerations, inaccurate ]
But I do not understand why this is important in this endpoint, as for the first version I understood that the FE should just post the agree/disagree user accuracy values.
Also, as I understand, the value of reviewRating.ratingValue
would map:
reviewRating.ratingValue: accurate
reviewRating.ratingValue: inaccurate
I have my thoughts against trying to store so much metadata in the users FE, that will heavily impact the performance of the plugin (both in terms of memory and communications times).
I agree, which means it may be a good idea for the GW to send you an (uu)id for each label returned. The FE can then send the user rating along with the id. The GW can either:
If you think that sending the full ModuleResponse
, within the user accuracy rating, is necessary to analyze the situation, we can study that options you exposed, but I want to insist that maybe, for a first approach at least, it would be enough to send the final_credibility label that is shown to the user.
Then something like this would be ¿enough?:
{
"tweet_id": "string",
"actual_credibility": "not_credible"
"reaction": "agree"
}
That would simplify the situation and at the same time would let us achieve this consideration you made:
In order for the rating to be useful, we need to know what the label is that we showed to the user, as well as the tweet for which we generated the label.
Generate temporary id's will add more complexity that for the moment maybe is not that necessary.
OK with this as a useful (this should cover 80% of our needs for this type of user rating) temporary solution.
However, since we already need to adapt the API (and since the GW already is generating some form of ids: query_id
s) I'd suggest including this in the current set of changes so we can add support for the full traceability later on without having to change the FE. So the new request body would look something like:
{
"tweet_id": "string",
"rated_credibility": "not_credible",
"rated_moduleResponse": "943082ac-7ff8-5668-babe-4f23fe482b5c",
"reaction": "agree"
}
Ok. I understand that this rated_moduleResponse
value would be the query_id
response parameter from the endpoint /twitter/tweet
. I am not storing it right now but that seems quite easy to maintain.
query_id
is an option, but I think it would be better if we had unique identifier for ModuleResponse
s sent to users. This is because a single query_id
can map to several ModuleResponse
versions as modules complete their analysis, so the final credibility label may change.
Also, hopefully we (will) have some cache in place, so that several users requesting analysis of the same tweet receive the same response object instead of sending each tweet for analysis several times. Don't know enough about the GW implementation to know whether the query_id
is appropriate or a new id is needed.
Firstly we're a bit unsure how rated_moduleResponse
is supposed to be used. The module responses are only cached for 24h in the database cache, and the response is updated after each module sends its response (if a module would send multiple responses for a single query only the first received is handled). The gateway has no way of determining whether the browser used the latest version of moduleResponse. This makes the rated_moduleResponse
hard to use for any purpose by itself.
If we do need a saved instance of a specific moduleResponse there is currently nothing in place to handle such a feature. Is the GW really the right place to set up such a database? Since I don't think this information is ever used by the FE or GW, would it not be better to store it in the same database as the reviews?
From an evaluation body like:
{
"tweet_id": "string",
"rated_credibility": "not_credible",
"rated_moduleResponse": "943082ac-7ff8-5668-babe-4f23fe482b5c",
"reaction": "agree"
}
the gateway would be able to attach the contents of the cache for the response to the reports with two edge cases where the cache contents are different from the FE version.
For case 1: if the FE attaches the module_labels field from the query you see what module responses were processed in the assessment, and other modules responses can be disregarded. Case 2. This seems like a quite unlikely scenario and the only mitigations for it I can think of are quite costly, in performance or development time. For the moment it might be acceptable to ignore.
Secondly in the author
field, I'm wondering about the url
field. Do we really want to publish any information about our users publicly? Do we really need an url at all, is it not enough with an identifier?
Hi, I think both edge cases for sending the moduleResponse are reasonable. The backend can ignore it if not available or if it doesn't match the rated_credibility
.
Sure, the user's url
is optional/private. If we ever want to show a user profile, we could add it with the appropriate authorization rules.
@rdenaux if I understand this correctly now. Below is sent to the GW by the BP when a logged in user has reviewed a CoInform System rating
{
"tweet_id": "string",
"rated_credibility": "not_credible",
"rated_moduleResponse": "943082ac-7ff8-5668-babe-4f23fe482b5c",
"reaction": "agree"
}
tweet_id == id of the tweet rated_credibility == final_credibility? (what our systems sends to the BP) rated_moduleResponse == query_id? (if not, what is it?) reaction == the BP users reaction to the CoInform system rated_credibility
Upon receiving this the GW builds a request as below:
{
"context": "http://schema.org",
"type": "CoinformUserReview",
"author": {
"context": "http://coinform.eu",
"type": "CoinformUser",
"url": "http://coinform.eu/users/cf40ced2-7cc8-98a3-1fb0-02c96d781626",
"identifier": "cf40ced2-7cc8-98a3-1fb0-02c96d781626"
},
"name": "accurate",
"reviewAspect": "accuracy",
"reviewRating": {
"context": "http://coinform.eu",
"type": "CoinformAccuracyRating",
"ratingValue": "accurate",
"reviewAspect": "accuracy"
},
"itemReviewed": {
"context": "http://coinform.eu",
"type": "CredibilityReview",
"url": "https://api.coinform.eu/response/251b6a72cd3a3af314baf748abdfac93c076e54272dabcaef5b7b115eb65c848/x",
"name": "not credible",
"itemReviewed": {
"context": "http://coinform.eu",
"type": "Tweet",
"url": "https://twitter.com/realDonaldTrump/status/1181172459325800448",
"identifier": "1181172459325800448"
},
"identifier": "251b6a72cd3a3af314baf748abdfac93c076e54272dabcaef5b7b115eb65c848"
}
}
and passes it on to the relevant backend (in this case the content-collector-api) as you mentioned above.
That's right.
rated_moduleResponse == query_id? (if not, what is it?)
This is mainly for the GW to retrieve/reconstruct the CredibilityReview
that was sent to the user, if possible. So, if the query_id is sufficient you can use that. I think, if you use that as the CredibilityReview.identifier
the CC can request the response from the GW and store more information in the DB (again, if still available) as this may be useful for figuring out which of the submodules needs to be improved.
OK. I'm pretty much done with the basic impl of this. Will do a bit of testing etc on this tomorrow and then try to deploy after lunch. Only thing I have not implemented of the JSON schema above is url for user and url for itemReviewed since the GW do not get this sent to it at the moment. If this is imperative have we can include it in a future deploy. @rdenaux
url for user is no big issue since we haven't agreed on a route for this anyway. The same is true for URLs for the CredibilityReview. However, it would be good to have a URL for the tweet, if possible.
then my suggestion is the BP passes the url along with initial request to the GW @aleixac
Right now the BP does not parse/store the tweet URL, but it should not be a problem to find it for every tweet and pass it to the GW for some endpoints. Just as a note here, the tweet URL format is: "https://twitter.com/{USER_SCREEN_NAME}/status/{TWEET_ID}"
yepp. at the moment it is only the /twitter/evaluate/label endpoint that requires it as I understand
To help with refining the definition of the
/twitter/evaluate/label
endpoint, this issue describes a test case where a user submits an accuracy rating for a credibility assessment. It also describes the desired outcome, namely a corresponding call to the content collector api.Context
A co-inform plugin user has internal UUID
cf40ced2-7cc8-98a3-1fb0-02c96d781626
. The user has logged into the plugin and views this tweet. The plugin calls the/twitter/tweet
, which is assigned a query id as follows:The plugin, then retrieves the response via the api which returns the following credibility assessment:
The FE uses this to show a label
not credible' for the tweet. The user clicks on a button to **agree** with this credibility assessment, which triggers the FE to call the proposed [
twitter/evaluate/label`](https://co-inform.github.io/gateway-api/#/to%20be%20developed/post_twitter_evaluate_label) endpoint.Desired outcome
The GW should issue the following call to content collector api, corresponding to the data the user entered.
The GW calls the
/user/accuracy-review
with a POST and the following content objectVariant
Note that if the user disagrees the only changes from the case described above is that:
name
will have valueinaccurate
andreviewRating.ratingValue
will also have valueinaccurate
.Note
The content collector (CC) will specify an enum for credibility values. It'd be nice to align this with the GW and FE, otherwise the GW will have to map between FE and CC enums.
Issue
When the user clicks on the submit button, the FE calls the
/twitter/evaluate/label
via POST. Please verify that the data submitted by the FE is sufficient to allow the GW implementation to convert it to the format required by the content collector.Important Consideration
In order for the rating to be useful, we need to know what the label is that we showed to the user, as well as the tweet for which we generated the label.
Via the
itemReviewed.url
, the content collector may try to reconstruct the data that was sent to the FE. However, this is dangerous, since the response may have changed as new results become available. Therefore, it may be better to, besides sending theitemReviewed.url
anditemReviewed.identifier
, also forward the fullModuleResponse
that was sent to the FE. I'd introduce propertyitemReviewed.moduleResponse
for this. This requres the FE to keep it and send it along with the rating. The main advantage of having this data is that we should be able to see whether some partial responses perform better than others.