[QUESTION] How to handle data fetching when there are multiple entrypoints to get all associated data?

h-e-l-l-o-w-o-r-l-d commented 9 months ago

Hi François,

I want to fetch Instagram stories and their media data. Problem: You have 2 entry points and therefore need 1 API call to get all story IDs and 1 API call for each story to get the media data:

For stories to fetch the story-IDs: https://graph.facebook.com/{business-id}/stories
For media data of the story: https://graph.facebook.com/{story-id}?fields=caption,id,ig_id,media_product_type,media_type,media_url,permalink,thumbnail_url,timestamp

I am wondering how to achieve that with external import!? What comes to my mind...

Fetch the story IDs, save to JSON, then in another import configuration read that JSON and fetch the media data.
Fetch the story IDs and then fetch the media data in a custom step with "curl".

Especially the second option doesn't feel right. Maybe I'm missing something, so what would be your recommendation?

Thanks in advance.

fsuter commented 9 months ago

I can see 2 possible ways to go about this:

a custom step that comes after the HandleDataStep: you fetch the media for each record and add the media in the same way External Import would have done it if the media were available in the main JSON structure. That means what the substructureFields property would have generated. After that you can use External Import as usual, with a user function transformation for storing each media and the children property for creating the sys_file_referenceentries.
the DatamapPostprocess event, during which you fetch the media for each record. In this case you need to create your own code for storing the media and creating the sys_file_referenceentries, but it may be easier than creating the proper structure as outlined in the first point.

HTH

h-e-l-l-o-w-o-r-l-d commented 9 months ago

Thanks for your quick reply. I think you misunderstood. I don't need sys_file_reference, because the media data will be stored as URL-string in simple input fields in the table tx_news_domain_model_news. So in the end the fetched data should look like:

"data": [ { "id": "12345678901234567", "ig_id": "9876543210987654321", "media_product_type": "STORY", "media_type": "VIDEO", "media_url": "https://...", "permalink": "https://...", "thumbnail_url": "https://...", "timestamp": "2024-02-05T14:32:35+0000", } { "id": "12345678901234568", "ig_id": "9876543210987654322", "media_product_type": "STORY", "media_type": "VIDEO", "media_url": "https://...", "permalink": "https://...", "thumbnail_url": "https://...", "timestamp": "2024-02-05T16:03:45+0000", } ]

The entries you see are media data of each story and we actually only need the URLs.

Now the difficulty: You have to fetch each entry (the media data) separately via the second entry point, because the story ID is part of the entry point. But before you can do that, you have to know the IDs of the stories via the first entry point. See my initial post for entry points.

I hope it's clear enough now? Sorry for confusing you.

h-e-l-l-o-w-o-r-l-d commented 9 months ago

... ideally I want only one import configuration to get the data you see in the previous post.

fsuter commented 9 months ago

Thanks for clarifying. I think I would use the DatamapPostprocess event in such a case. It provides you with a list of all records that were handled, with their uids, including for the new ones. Then you can loop on the records and fetch the media data for each. You can't benefit from all the mapping and transformation tools provided by External Import, but since all you want is the URLs to store in a text field, I think it is not so much code to write.

h-e-l-l-o-w-o-r-l-d commented 9 months ago

So basically in the Datamap Postprocess I use the classes from

https://docs.typo3.org/p/cobweb/external_import/7.2/en-us/Developer/Api/Index.html

where I can change the importer config (to define a different API entry point) and fetch the media data in a loop, right?

fsuter commented 9 months ago

Yes, you can do that. I did that in one project. I would suggest having a separate configuration, even though I heard your preference for having a single one. I think it makes things clearer to maintain.

h-e-l-l-o-w-o-r-l-d commented 9 months ago

Yes, that makes sense. So you mean I should define a configuration in the TCA file as usual and use it only in the Datamap Postprocess? And the configuration index is then used as $index in

$messages = $importer->import($table, $index, $rawData);

Or should I define the configuration somewhere else?

fsuter commented 9 months ago

In the TCA, as usual. Just with a different index, and indeed you pass the index in your call to import.

h-e-l-l-o-w-o-r-l-d commented 9 months ago

Great, will do so. Thanks alot for your super fast replies and the time you spent into this extension. External import is for sure one of my favourite TYPO3 extensions. :o)

bh-teufels commented 4 months ago

I have a similar requirement. I want to import data from mobile.de SearchAPI.

Can I also pass access data for accessing URI with connector = feed?

And what would be the recommended approach here?

My original idea would otherwise be to make the mobile.de API call in advance using a scheduler task and go through the various entry points and save all the required data in a .xml file and then work with this file in external_import

According to my interpretation of the documentation, I may also achieve this via the event "Process connector parameters" or possibly via a custom data handler or Custom process step

fsuter commented 4 months ago

You can't pass access data to svconnector_feed, as it is not designed to receive any. The recommended way to do this would be to create your own connector, possibly extending svconnector_feed and implement the needed authentication logic.

Your approach with a Scheduler task can also work of course.

bh-teufels commented 4 months ago

thank you

where can i find the assignment of which class it should use for connector = 'feed' so that I can use my own connector 'feed-mobilede' and then define that it should use my custom connector class.

Is this only the line in ConnectorFeed.php protected string $type = 'feed';

fsuter commented 4 months ago

Yes, this is all that's needed.

bh-teufels commented 4 months ago

Is there any interest in including this in svconnector_feed in general?

extension of the configuration:

'connector' => 'feed',
                'parameters' => [
                    'uri' => 'https://services.mobile.de/search-api/search?customerNumber=<overide_with_your_customerNumber>&pageSize=100',
                    'accept' => 'application/vnd.de.mobile.api+xml',
                    'username' => 'overide_with_your_user',
                    'password' => 'overide_with_your_pw',
                ],

extension of the query function:

$headers = null;
if ( (array_key_exists('username', $parameters)) && (array_key_exists('password', $parameters))) {
            $username = $parameters['username'];
            $password = $parameters['password'];
            $auth = base64_encode("$username:$password");
            if (is_null($headers)) { $headers = []; }
            $headers = array_merge($headers, [ 'Authorization' => 'Basic ' . $auth]);
}
if (array_key_exists('accept', $parameters)) {
            if (is_null($headers)) { $headers = []; }
            $headers = array_merge($headers, ['Accept' => $parameters['accept']]);
}
 if (array_key_exists('useragent', $parameters)) {
            if (is_null($headers)) { $headers = []; }
            $headers = array_merge($headers, ['User-Agent' => $parameters['useragent']]);
}

that's all that was needed

fsuter commented 4 months ago

I would rather not, as there are many different way of authenticating. I have actually never had to cope with basic authentication, rather with an exchange of tokens, which requires its own logic.

What could be done and would cater to your need is to introduce an event to modify the headers.

bh-teufels commented 4 months ago

Thank you until possible implementation via modify header event I work with fork & branch

cobwebch / external_import

[QUESTION] How to handle data fetching when there are multiple entrypoints to get all associated data? #313