adobe / theblog

Apache License 2.0
7 stars 14 forks source link

Japan feeds are not working #631

Closed kptdobe closed 2 years ago

kptdobe commented 3 years ago

Follow up task for #618.

I see 2 issues:

trieloff commented 3 years ago

@stefan-guggisberg can you take a look at the block list for the first issue?

@kptdobe there is a good chance that the query needs to be URL encoded to work, I haven't tried it with non-ascii characters yet.

kptdobe commented 3 years ago

Ok, so @stefan-guggisberg you can wait until I have done the second part and made sure the queries are correct. No need to expose non-working feeds ;)

stefan-guggisberg commented 3 years ago

@kptdobe ok, tell me when you're ready.

kptdobe commented 3 years ago

Here is the list of feeds URLs:

  1. https://blog.adobe.com/feeds/jp.xml
  2. https://blog.adobe.com/feeds/jp-3d-ar.xml
  3. https://blog.adobe.com/feeds/jp-ccdojo.xml
  4. https://blog.adobe.com/feeds/jp-community.xml
  5. https://blog.adobe.com/feeds/jp-corporate-news.xml
  6. https://blog.adobe.com/feeds/jp-design.xml
  7. https://blog.adobe.com/feeds/jp-digital-document.xml
  8. https://blog.adobe.com/feeds/jp-digital-transformation.xml
  9. https://blog.adobe.com/feeds/jp-education.xml
  10. https://blog.adobe.com/feeds/jp-photography.xml
  11. https://blog.adobe.com/feeds/jp-stock.xml
  12. https://blog.adobe.com/feeds/jp-ui-ux.xml
  13. https://blog.adobe.com/feeds/jp-video-audio.xml

Some notes:

  1. https://blog.adobe.com/feeds/jp-digital-transformation.xml: Digital Transformation is a parent level topic. Logically, this feed should give Digital Transformation and all its children. To do that, the requests to the query-index should pass all topics to filter on (9 in total). We would need to pass the search grouping predicates in the query string of the request (hlx_group.1_property=topics&hlx_group.1_property.operation=like&hlx_group.1_property.value=Digital Transformation&...hlx_group.9_property=topics&hlx_group.9_property.operation=like&hlx_group.9_property.value=XYZ) which for sure will be too long. I limited the filtering to the parent. The parent which can be omitted (automatically added on the page) will have to be added for an article to appear in the feed.
  2. same applied for https://blog.adobe.com/feeds/jp-corporate-news.xml
  3. https://blog.adobe.com/feeds/jp-digital-document.xml is also a parent level topic but it has only 2 children. I have created the full correct request.
  4. All the others are leafs, filtering is fully correct.
  5. We need to test in production because the {article}.embed.html requests are included with ESI includes which do timeout or fails on hlx.page. It is almost impossible to get a correct feed xml stream there.
kptdobe commented 3 years ago

@stefan-guggisberg Could you please make sure https://theblog--adobe.hlx.live/feeds/*.xml and https://blog.adobe.com/feeds/*.xml do not 404 ? (and/or explain what needs to be done so next time I know how to do it;) ). Thanks a lot.

stefan-guggisberg commented 3 years ago

@kptdobe done

for the next time ;)

Key is the root path segment up to the first . or _ Value is an arbitrary value (Fastly dictionary entries need to have a value). I used ok.

kptdobe commented 3 years ago

Thanks!

kptdobe commented 3 years ago

It still does not work because of https://github.com/adobe/helix-pages/issues/794