Open vmaurin opened 1 month ago
Dear Vincent,
Due to the statements table being excessively large with 20 billion records, the index checks were causing instability and had to be removed. We have now implemented a two-layer protection system to prevent duplicate statements, using a Redis cache and a database query in a specific table.
If you send a duplicate statement, you will receive a 422 error code indicating that this statement of reasons is already known. This should enable you to save the submission state on your side to avoid resending the same statements repeatedly.
Hi @alainvd
Thank you for your fast response !
For sending them, we follow your guidance already, and it seems to behave well.
The issue is about /api/v1/statement/existing-puid/<PUID>
documented here https://transparency.dsa.ec.europa.eu/page/api-documentation#existing-puid
The documentation is stating
There is an end point that will allow you to check if a PUID value is already used.
But that is not really true due to querying the elasticsearch index. If it is impossible to fix the behavior for performance reasons, maybe the documentation should be updated ?
Something like
There is an end point that will allow you get a SoR by PUID. Note that SoR are will be available to this endpoint after X hours, or after midnight the day they were submitted
For a user perspective, with the current documentation, it is not clear that it is a different "database", and also there is no clear indication about the indexing frequency/scheduling
Looking at the source, it sounds that
existing-puid
is based on Opensearch, defeating a pattern where one check if a PUID exist before posting it or giving people the opportunity to look up to a submitted SoR.Also as it seems the data is not indexed continuously, it makes it very complicated to check a statement of reason just sent.
Could it be possible to be based on a database table instead ? (like adding an index on the statement table to search by platform_id, pid ?) I understand that there is an archiving system in place on this table, but maybe the first try should be to hit this table first, to be consistent with the store method