FeLoe / DataDonations2021

Material for Organising a Mobile Lab
1 stars 0 forks source link

Checking whether Selenium WhatsApp scraper can be used remotely #46

Open FeLoe opened 3 years ago

FeLoe commented 3 years ago

We have a Selenium scraper for WhatsApp - however, we cannot have respondents set that one up on their own laptop so it would have to work remotely. Problem here is that you first need to scan a QR code for the website to open.

We need to check whether there are any possibilities for using that scraper, otherwise we are back to having people manually export single chats, transferring them to their laptop and dragging them into the system.

vanatteveldt commented 3 years ago

I would think the procedure could be:

(1) Open a zoom or other channel and share our screen (2) carefully show and tell respondents what we want to do, what data it would extract, and show e.g. example with our own whatsapp or a whatsapp account made for this purpose. Check consent again before proceeding. (3) Start the scraper, which opens whatsapp web and shows the QR code on the shared screen (4) Ask respondent to scan the QR code from the shared screen (5) Keep the screen shared while the bot does its work, log off whatsapp, and show the data we stored

Now, of course the bot will actually open all whatsapp messages right on the screen. How can we prevent ourselves from looking, and convince the respondent that we are not looking? Can we minimize the scraper, or insert some custom css to black out all text (except for links - just for show). The latter sounds doable and would be good to show in (2) to convince the student that we really don't want to see their chats

FeLoe commented 3 years ago

Might be a possible solution - the problem is that we need to be in contact "personally" with every user, so for something more large scale this might become difficult. Not sure how to organise screen sharing this way. Also never tried to insert custom CSS - but since we only need one small part of the website for the Links that might be an idea.

One other option (not necessarily easier I think) is to have participants do the WhatsApp login once and then send us their (Firefox or Chrome) profile. When you then use that profile in the Selenium scraper usually no additional QR code scan is needed. But I don't think having them do the scan + locate and send the profile is any easier than the screen sharing option.

FeLoe commented 3 years ago

Just tried it with a super simple iframe implementation of Jitsi and it worked well. I screen shared the QR code from my laptop to my tablet, scanned it there and everything opened :) Seems like this could be a good solution! We mostly would need to generate a meeting ID for the user (most likely including some ID), make sure a corresponding Jitsi meeting is launched on a server and then "show" the user the scraping process. Good thing is: we can already disable that Jitsi asks for audio/video/screen sharing of the user beforehand and to avoid that they turn their camera/mic on.

<iframe allow="" src="https://meet.jit.si/Newtry" true"="" style="height: 400px; width: 50%; border: 0px;">

FeLoe commented 3 years ago

And this is what happens if I include the simple line body { filter: blur(30px); } at the beginning of the Web WhatsApp CSS - I think that way it is sufficiently blurred :)

Screenshot 2021-01-26 at 14 17 05

vanatteveldt commented 3 years ago

Excellent! I don't really see a way to do this without screen sharing or something similar (we can email them the QR code - but since you have limited time I'm not convinced that really makes things better).

IF setting up the sharing is problematic, we could post the QR code to a public website (https://digitalevoetsporen.nl/RESPONDENTID) and ask them to go there, but again, not really sure that makes it easier - and showing them what we do might make them more comfortable?

But that could be done even with hundreds of respondents given enough student assistants,

FeLoe commented 3 years ago

I think if we can do it like this (embedded Jitsi, blurred webpage) it is by far the easiest solution for both us and them. I already added the blurring to the existing WhatsApp scraper, apart from that nothing changes. So the steps are:

1) Create Jitsi link for each respondent 2) On the server start up a Jitsi meeting and the scraper displaying the QR code 3) Have respondents start the meeting on the website 4) Start screen sharing 5) When code is scanned, blur the website 6) When all data is scraped, transfer it to user for inspection.

The main issue I see here is that we will have the "unfiltered" data (participant did not yet remove the info they do not want) on a server somewhere, so we might look into using sth temporary for that that can be "destroyed" after sending the data for inspection to the user, having them send the filtered data back to the final destination.

vanatteveldt commented 3 years ago

Sounds good. If desired we can add more feel good steps (show respondent how the program works before getting them to scan, explicitly blurring our only the text and not the links, etc etc) but that's just candies for dandies.

We would not have to store the data anywhere, we can just keep it in memory. Bob's idea was that all server-side anonymization etc would happen only transiently, and the whole data file would be sent back to the client for visualization and consent. So I think this would fit into that general thinking?

FeLoe commented 3 years ago

Yes, that sounds good (and like you will be the person explaining this in detail to the ethics committee :)). Ok, but for now I will assume that we use the scraper. Then I can start implementing anonymization steps in the scraper and tick off another box.