digitalmethodsinitiative / 4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Other
246 stars 59 forks source link

Cartographer and PixPlot Image preview #400

Open dale-wahl opened 10 months ago

dale-wahl commented 10 months ago

This is merge-able!

Notes:

To-do:

dale-wahl commented 9 months ago

Current results seem solid.

To-do:

stijn-uva commented 7 months ago

Also, it would be nice to make this compatible with all image-generating/downloading processors, including 'Extract video frames'.

dale-wahl commented 7 months ago

Also, it would be nice to make this compatible with all image-generating/downloading processors, including 'Extract video frames'.

Right now, I need the cartographer to understand what to do with subfolders of image groupings. Or perhaps iterate_archive_contents doesn't since the files are not found (possibly a mismatch between the temp_path yielded and where the file was actually extracted). I could unpack and os.walk in the cartographer, but I really hate that message as it does not work well with amount caps since the whole (often quite large) archive is unpacked anyway.

Regardless, I think we could do better than to just splash the images semi in order. They could be automatically categorized by the subfolders they are in for example (mimicking the scene timeline).

dale-wahl commented 7 months ago

Also, it would be nice to make this compatible with all image-generating/downloading processors, including 'Extract video frames'.

Right now, I need the cartographer to understand what to do with subfolders of image groupings. Or perhaps iterate_archive_contents doesn't since the files are not found (possibly a mismatch between the temp_path yielded and where the file was actually extracted). I could unpack and os.walk in the cartographer, but I really hate that message as it does not work well with amount caps since the whole (often quite large) archive is unpacked anyway.

Regardless, I think we could do better than to just splash the images semi in order. They could be automatically categorized by the subfolders they are in for example (mimicking the scene timeline).

Sorted out subfolder handling in cartographer. Works with the video-frames processor. Currently there is no usable .metadata.json file (looks to be copied over from higher level processor and does not have image filename references for post_ids to be extracted) so images have no descriptions. I did not do anything to categorize the images by scene. That's more complex as the categories are currently decided from the metadata (which does not exist). This would be a special case.

dale-wahl commented 4 months ago

I merged master into cartographer again. Last week there were a few bugs. This week the only thing I saw was actually an issue with the master in that, if we allow unlimited images (e.g., max images is set to 0 in configuration), all the downloaders would use a max of 0 and thus always download all images!

There is this one super weird but and so far only visible on tiktok datasets... image I am not sure what's going on there, but it is only the thumbnails so I need to figure out why that is the case.

dale-wahl commented 4 months ago

I merged master into cartographer again. Last week there were a few bugs. This week the only thing I saw was actually an issue with the master in that, if we allow unlimited images (e.g., max images is set to 0 in configuration), all the downloaders would use a max of 0 and thus always download all images!

There is this one super weird but and so far only visible on tiktok datasets... image I am not sure what's going on there, but it is only the thumbnails so I need to figure out why that is the case.

ok, ok. it actually just occurs when there are very few images. PixPlot itself fails if you have less than 12 (and our cartagrapher ignores/bypasses that). it seems to have to do with the zoom not being far enough away that it doesn't trigger the thumbnails. no idea how to fix or address... but yeah.

sal-uva commented 3 weeks ago

Tried to run this, but getting some importing errors in JS: image

sal-uva commented 3 weeks ago

I would also suggest to rename the processor to something more specific. "Create Image visualisation" is a bit general; something like "Display images on Web page" is already a bit more concrete.

Relatedly, it is a bit unclear to me what can I do with the outputted zip file. Can the description be updated so users are told what they can do with it? And maybe add a readme file in the zip results?

As I understand it, the page opened by the "View" button is what's most interesting here right? Can't the zip file contain this page as well?

Otherwise good work!!