Open kouloumos opened 1 week ago
@kouloumos based on your Proposed Solution for the script here I'm thinking of updating the value of url
key by the link of main/first post for the given title.
feed_data = {
...
'url': <The link of first/main post (instead of latest post)>,
...
}
I'm thinking of updating the value of
url
key by the link of main/first post for the given title.feed_data = { ... 'url': <The link of first/main post (instead of latest post)>, ... }
Will that result in the thread_url
of the combined summary to match the thread_url
of the individual documents of the thread? If yes, then proceed wit that.
The "Push Combined Summary From XML Files to ES INDEX" cron job is currently pushing combined summary documents to the Elasticsearch index with incorrect URLs. Specifically, the
url
andthread_url
fields in each summary document are being set to the link of the last reply in the resource, rather than the correct, original link.Background
The URLs in question are generated in the
read_xml_file
method as part of the XML processing workflow. This issue originates from the "XML Generation" cron job, which generates these XML files with the incorrecturl
here.Proposed Solution
Review the URL generation logic in the XML generation cron job to ensure that
link
points to the correct resource link rather than the last reply in the resource. The link that we want here is actually thethread_url
.