Closed MarcoViola88 closed 5 months ago
Fair points. We could put the convincing/realness scale on the first viewing only in the fiction condition to avoid the double presentation, what do you think? That'd mitigate the overall suspicion issue.
I really like how you framed that and indeed I didn't think of it in terms of 1st/3rd person + aesthetic studies.
a single self-report measure people will somehow "put everything" in there
I feel like if we want this study to be a step forward from our previous work we should aim at being a bit more refined in terms of our exploration of the psychological mechanisms hypothesized to be at stake. Like in this case, having 2 variables would make the data more rich (we can average them - if they are very correlated - to get an even more robust general measure; and analyze them separately - which I'm optimistic would yield some interesting dissociation). We could potentially explore the idea that the effect of fiction on emotion is related to a lowering of "Self" engagement, and thus would be more marked in the subjective scale than in the more objective one.
they might slow down the experiment a bit
I takes <1.5 second to answer to a scale once participants are familiar with the instructions, so that's like + <2min total which I think is worth it.
if we give two measures they could arbitrarily interpret them, and each one would be reporting something different
Agreed, we need to work on the phrasing here
"Focus on the first/third person" is a bit abstract (at least to me ^^), maybe something like:
Let's continue refining :)
You totally convinced me about the 2 stimuli, thank you.
Realness Fair points. We could put the convincing/realness scale on the first viewing only in the fiction condition to avoid the double presentation, what do you think? That'd mitigate the overall suspicion issue.
I'm still a bit concerned of asking realness together with the other 2 ratings, even if only for AI-generated stimuli. There are 2 reasons: (i) isn't that a bit hard to judge in just 2-3 secs? (ii) wouldn't asking that before / after asking to rate sexiness and arousal be influenced / influence the process of expliciting and conceptualising how arousing & how sexy we find the stimulus? That being said, when I say that I am "a bit concerned", the "bit" is not rhetoric: my ideas are not 100% clear about this, so I am open to arguments (or even to defer to the judgments of whowever has clearer idea in mind).
wouldn't asking that before / after asking to rate sexiness and arousal be influenced / influence the process
Yes you're right about the fact that it potentially creates a confound...
Mmh
Okay then, maybe we re-lower slightly the number of stims (not 100% sure that's necessary, we can try first with the 80 and see how much time it adds up) and, indeed, have a second run at the end where we show (all?) the pictures and we say "In this second phase, we are evaluating the quality of our algorithm, and would like to ask you whether you notice any issues or problems with the images presented to you in the previous phase." and have a rating of like "Obviously fake - Very realistic".
My hope is that we would expect: 1) overall, most (fiction-condition) rated as very realistic, and if not we can use that data to filter out some items. 2) a possible lingering difference between items presented as real and fake, the latter being rated slightly less realistic than the ones presented as photos. This would show a deep effect of our manipulation that affects (from memory) the re-exposure
One issue is that one could argue that then the ratings of arousal are influencing the posteriori ratings of reality, but that's fine as 1) the former is our primary target of interest and 2) we can statistically check the effect of arousal on reality beliefs and its interaction with the condition. 3) if that proves to be a big issue we can always in the future run a subsample with the order of the tasks inversed and see
Arousal Maybe we should ask this scale first, it's easier to then dissociate the 3rd person Sounds reasonable, right.
Instructions: How much do you find the image arousing. Did you feel a reaction in your body (whether positive or negative) This I'm not sure but in theory arousal is explicitly dissociated from valence, but since we don't ask about valence IDK? This question is about your own personal reaction. MMM, I would avoid getting in trouble by opening the pandora's box of valence, i.e. I'd avoid mentioning positive/negative.
Prompt (in the task) How much did you feel a reaction to the image in your body?
Sexy: There is an implicit (IMO useful) distinction that we create by using the terms "feel" above and "think" below. Instructions: This question is about how sexually appealing the image is. In other words, how sexy do you think the image would be perceived by an average viewer similar to you in terms of gender and sexual orientation? Cool, I agree that feeling/thinking would do a good job subliminally.
Prompt: Do you think this image would be considered sexy by others?
Despite having proposed the notion of "average viewer similar to you in terms of gender and sexual orientation" myself, I am not 100% sure we should keep it. I suspect that keeping it = improving accuracy. But refraining from specying who the average observer is (i.e. a samely sexually-oriented one) we would probably get a stronger dissociation.
I stop here for now so to allow other to express their thoughts!
Hey guys, I read almost everything and I'm ok with the direction the study is taking (ok for the 4x2 paradigm!). Here are a bunch of things I'd like to say:
1 - About the individual traits scales (GAAIS, etc.): I bumped into this very interesting preprint, whose name is encouraging: "Can we assess attitudes toward AI with single items?" Apparently, the authors are confident we can. I see the importance of replicating the GAAIS (non)results, but I also see the problem of increasing the duration of the exp. My golden rule would be: anything < 20 minutes is ok (see also Revilla & Höhne, 2020).
2 - I like the fact that the image remains on the screen for a couple of seconds. It's different from our previous work, but I find it crucial, especially if we have images of couples, which are less likely to be perceived as AI-generated. One thing I'd slightly change is the presentation; I'd say: 1: AI-generated/Real > 2: the photo appears (but AI-generated/Real remains on the screen with the photo) > 3: everything disappears
3 - I see and appreciate Dominique's bottom-up logic saying: if the two measures are correlated, we average them; if they're not, even better: we have two potentially different measures. However, and I'd like to ask this to our philosophers here: let's assume that we average the two measures: what are we assessing, exactly? Can we think about some underlying factor?
4 - As for the main measures; I like the 1st/3rd person phrasing and the feel/think differentiation. However, I would avoid lengthy prompts and especially the "body" word. I suspect youngsters feel the body less, and/or are less prone to say that something was moving in their body (sounds a bit pervy?). This would lead to a a potential floor effect in the arousal measure and thus a decrease in the correlation of the two measures.
I'd go for something like:
Arousal How much did you feel sexually excited by this image?
Sexy Do you think others* would consider this image as sexy?
*Other has some issue too: who are the others? What are their sexual orientations? Here's two alternatives: How much do you think that other people would consider this image as sexy, on average? (wordy) How sexy do you think others would find this image, on average?
5 - I love Dominique's idea about the "we are evaluating the quality of our algorithm" thing!
Hopefully, tomorrow I'll read everything again and come up with new suggestions. @DominiqueMakowski when's your deadline?
Bisous, besos, and baci =)
Revilla, M., & Höhne, J. K. (2020). How long do respondents think online surveys should be? New evidence from two online panels in Germany. International Journal of Market Research, 62(5), 538–545. https://doi.org/10.1177/1470785320943049
Thanks @AleAnsani great comments ☺️
I made some changes:
Can you give it a try and let me know how long does it take currently, so that we have a better idea. Note: if you want to check specific things, you can skip the image & fixation cross by pressing "S"
I'd still push back on having overlap text + image:
1) although we're not doing neuroimaging here or eyetracking (in which case it would be problematic as we don't know whether they are reading or watching) I'd still prefer to have them separate to have theoretically disentangled processes 2) I have trouble seeing the benefits of having the text on, as a "reminder" I suppose but I really doubt that they'll forget the cue. I'd even say it could add some noise: what (I think) we (at least me ^^) are trying to see is the effect of apriori beliefs of fakeness (expectations, or "priors" in a Bayesian sense). Hence the cue before that primes the participants. But if we have it overlayed, we have some weird interaction that can take (more easily) place: people can look at the image, the re-read the text, then potentially feel conflicted, then re-look the image and try to solve their mental conflict etc. etc. What I'm trying to say is that if we have the image + the text we know even less what participants are doing, which is not paradigmatically optimal
Hi guys, thanks everybody for the intense brainstorming, and double thank to Dominique for trying to come up with some synthesis & implementation! Given that we need to fixate something, I'd try to aim at consensus. This morning I tested the exp. After 2-min with mobiles, I restarted with laptop and completed it in 14mins -- which should be fine. Here are some observations: -- On mobile, I'd add some border on the left & right of the informed consent to improve readibility. -- In the instructions, there is still "Picture" (whereas the cue for real images is "Photo"), so it should probably be changed. -- also in the instruction, in "high quality erotic (but also non erotic content)", the bracket should be probably closed a little earlier -- when asking ethnicity, I would use a fixed-choice menu rather than allowing inputs. But maybe it's even better to ask for countries rather than ethnicity -- we should probably ask some info for sezual orientation (either via inventories or with a single question?) -- based on Dominique's reflection that people would probably feel easier to start from their subjective feeling before providing an abstract question, I agree that we should probably invert the order of SEXY / AROUSAL, both in the intsructions and in the task. -- when we ask about arousal, I am happy not to mention the body. However, since the "average sexyness" is ABOUT the image, I would also specify, for the sexual arousal question, that our question is about the image (in Russell's theory terms, we want to measure Perception of Affective Quality, not Core Affect). Hence, instead of "How much do you feel sexually aroused" I propose "How much do you feel sexually aroused by this image".
Concerning the "keep the cue" issue, I am conflicted. On the one hand, I can see Dominique's concerns that keeping the cue alongside the image as proposed by Alessandro might distract people (are they looking at the image or at the cue?) and ultimately elicit different processes. On the other hand, having performed the task, which is rather repetitive, I noticed that I hardly paid the cue enough attetion. However, here is a way out I thought about (provided it can be coded without excessive effort): we should probably give a CUE before and then reinforce the manipulation AFTER, i.e. during the instructions. In other words, rather than "How much would you consider this image to be sexy on average" / "How much do you feel sexually aroused by this image", I propose we ask: "How much would you consider this PHOTO to be sexy on average" / "How much do you feel sexually aroused by this PHOTO" or "How much would you consider this AI-GENERATED IMAGE to be sexy on average" / "How much do you feel sexually aroused by this AI-GENERATED IMAGE". Maybe the caption is too much, but I would reinforce the manipulation, which otherwise risks to be a little too mild -- a concern coming from the small results we found in the (Marini et al.) paper currently under review
In any case, giving that Dominique is putting a lot of effort and thought into it, I think it's good that we discuss, but when it comes up to deciding what to implement I trust his judgments -- and in worst case scenario we could switch the variables at some points (provided we stay sufficiently vague during the IRB/preregistration stage, that shouldn't be impossible, right?).
Hello everyone! Sorry for taking so long. I've tried the exp, it took me 16 minutes, which is fine. Here are my considerations:
I fear the manipulation might be a bit weak. Maybe it's just me, but after a bunch of pictures, due to time pressure, I was so concentrated on the actual images that I completely forgot about the labels that preceded them. I didn't really pay attention to them, I just experienced them as some sort of countdown. We all know that priming work even subconsciously, but my guess would still be that if we want to keep the labels as primes (@DominiqueMakowski 's idea of priors arouses me more than the pics!), we should leave them on screen for slightly longer. It makes sense to make them disappear when the image appears, but I fear that, as it is now, they won't work perfectly as proper primes. (not sure about @MarcoViola88 's idea of repeating AI-generated or Picture in each image's prompt; maybe it can strengthen the manipulation...even though it's wordy)
Too many images are completely unrelated to sex. I felt this as a bit awkward. My naive feeling was: "You're asking me about sexual arousal, why the hell am I seeing elderly couples hand in hand?! They're so tender they're actually the opposite of a sexual stimulus". I think it's ok to have some kind of baseline for the arousal, but now the DV(s) would be so (unnecessarily?) skewed! It can or cannot be a problem, but I would maybe refrain from putting the participants in some cognitive dissonance-like state.
In the screen where we explain the task, for the sexy variable we say "this question is about how sexually appealing you believe this image would in itself be to people similar to you in terms of gender and sexual orientation". Just a thought on this: such phrasing will eventually lead to a correlation between the two measures. However, if we want to capture a more general difference between subjective / objective measures (or 1st / 3rd person), we should probably frame the prompt in a more general fashion, such as: "this question is about how sexually appealing you believe this image would be for an average person". For me, it would be nice to capture that maybe one deems a pic to be very arousing for others, but not for themselves, or vice versa. Does it make sense?
These are of course points of discussion. But @DominiqueMakowski , please feel free to ignore them if your time is running out.
All good, I prefer to take a few days more and have something solid than rushing things out with major flaws. thanks a lot for your time and input, I'll read that again tomorrow and will propose some further changes and will let you know :)
Some quick comments on @AleAnsani's points:
I also fear the manipulation could be weak. Some non-mutually exclusive options to reinforce it might be: 1a. Clustering photos and AI-generated images in 2 big blocks, each introduced by a cover-up story; or maybe in 4 small blocks (to avoid habituation). 1b. As mentioned, reproposing the manipulation during the TASK INSTRUCTIONS. I disagree with @AleAnsani about wordiness: we only need to switch from "How much would you consider this image to be sexy on average?" to "How much would you consider this PHOTO/AI-GENERATED IMAGE to be sexy on average?", i.e. +0 / +1 words respectively! 1c. Watermaks? But I am not very convinced about them -- they might best be used for the authenticity manipulation, as @DominiqueMakowski convincigly argued 1d. Slighlty longer presentation for the cue, as suggested by DM... 1e. ... but also, what about using different colors for AI-GENERATED & PHOTO? Or maybe a iconic logo together with/instead of the image, e.g. a photo machine & a stylised neural network? They might be more powerful as subilimal primes
I also felt a little bit of cognitive dissonance when presented with sensual images of horny women right after a picture of elderly couples XD that's my main reason for proposing to stuggest that the arousal rating task need by about the stimulus, not about one's body, since a body-cnetered self-report might be more influecned by carryover effects. Yet, I have no clear suggestion as of how to avoid this issue. That being said, I am not entirely sure we should remove the stimuli. I suspect that's mostly a choice based on expected statistical technicalities, hence I am glad to ultimately defer on the experts here.
I made a similar point in favor of "average person", although it risks to be interepreted sometimes as "average person, irrespective of gender & orientation" and sometimes as "average person of my gender and orientation". But there is no free lunch...
That being said, I am happy to discuss whatever further issue, but equally happy to trust @DominiqueMakowski for whichever choice he could make based upon this discussion!
Okay so it seems like we'er currently ~15min with a 2 phase expe with 54 stims. I think that's very good, we can easily throw in 1-2 questionnaires.
If we observe a small effect (or an absence) we can always try to make the manipulation stronger by going to the next level and crafting custom implicitly-priming descriptions or watermarks. It can be a natural follow-up to also see the difference between explicit dichotomous priming and something more "ecological" and realistic
different colors for AI-GENERATED & PHOTO
Good idea, will try that (the colors will be assigned randomly between like, blue, green and red for each participants)
Or maybe a iconic logo
This makes it hard to counterbalance/randomly assign, opening us for criticism about potential confounds
we should leave them on screen for slightly longer.
Okay will do.
Clustering photos and AI-generated images in 2 big blocks
Me no-likey block paradigms 😬
(@DominiqueMakowski 's idea of priors arouses me more than the pics!)
lmao
Mmh that's an interesting one.
"although it risks to be interepreted sometimes as "average person, irrespective of gender & orientation" and sometimes as "average person of my gender and orientation"
Fair point...
For me, it would be nice to capture that maybe one deems a pic to be very arousing for others, but not for themselves, or vice versa
Thanks so much for the comments it's really cool to brainstorm this I feel like we're basically doing the job of future reviewers to ourselves and making our decisions fully justified and thought-of!
Keep 'em thoughts coming! we still have some time ☺️
Hi guys! I feel like I'm in uncharted territory, but "me quito el sombrero" (Spanish phrase) to the level of this discussion. I just carried out the experiment (16 minutes, which is nice), so here are some first impressions:
-I wouldn't mention that the study takes 30 minutes on the first screen. In the age of tik tok, I´m afraid that 30 minutes can freak out more than one participant. If we do mention the duration, I would be inclined to say 15 minutes. -I agree with @MarcoViola88, I would mention country instead of ethnicity. -I definitely share the opinion about the weakness of manipulation. After a few trials it was easy to ignore the words and base answers entirely on the images. Or at least that is the subjective experience. As the manipulation currently stands, it could function as a variant of suboptimal priming. In that case, it may be convenient to control the affective dimensions (valence) of the cue words. -Maybe it's me, but after a few trials it was evident that my response strategy was to first evaluate my personal rating (arousal) and based on this rating I tried to estimate the first scale (average sexy). I think that the content of the stimuli itself (sexy images) can facilitate this response pattern. Along these lines, I am not convinced by the term "average" for the objective scale. @AleAnsani Do you remeber how we framed the attractiveness question in the andro study? I think we faced a similar dilemma.
Un abrazo!
@marmarini said:
The most prominent issue I encountered during the task is the potential for participants, myself included, to forget whether an image is real or AI-generated. This is a significant concern as it may undermine the results of the experimental paradigm. Participants could lose track of the nature of the images, potentially compromising the validity of the data. Could we consider displaying this information prominently above each image to mitigate this forgetting?
I found the task to be lengthy and fatiguing, particularly in relation to slider usage. I ended up rapidly moving the sliders resulting in overlooking specific values on the scale. An alternative could be the use of a Likert scale, where participants simply click on the desired value. Maybe a 9-point Likert scale could simulate a continuous variable, potentially simplifying the participant experience.
Another significant concern is the potential imbalance between erotic and non-erotic stimuli. A scarcity of arousing stimuli might lead participants to react more strongly to the limited instances of arousing content, introducing bias into the responses. This phenomenon, which I would term "the nice buttocks effect," could lead participants to overlook information regarding the nature of the image (real or AI-generated) when encountering appealing content amidst a sea of uninteresting pictures. I mean, buttcheeks are extremely salient.
Is the paradigm compatible with mobile devices and smartphones? Ensuring compatibility with these platforms would be essential for participant engagement and the practicality of conducting the experiment in various contexts, especially given its perceived length for an online experiment.
I tried to complete the experiment using a mobile device and encountered significant difficulties, particularly with slider movement and image formatting. The images were often too large for the screen, making it challenging to use the slider without zooming in. Probably, I would ensure compatibility with mobile devices, considering the prevalence of their use in online studies.
@MarcoViola88 @marmarini @AleAnsani @AntonioOLR84
I hear the concerns that the manipulation is not strong enough and people will "not read" / "forget" the prime and there won't be any effect, but I am fairly confident that even with this weak form of manipulation it will work (we have done it in the past with even less strong type of cueing (Sperduti 2016) and it worked quite nicely. I think the sentiment might be also confounded by the fact that we as experimenters know the true nature of the stim and thus we are not engaged in the task in the same way as participants would be.
I think the only way to definitely answer that is to run the study on a "preliminary" sample and see. We can start with the UK sample while we translate the experiment, then quickly check, if it works good we deploy at full scale if it doesn't we revise.
To be honest, I would start with restricting the experimental condition to computers. I can see the value of going multiplatform to maximize the number of participants, but it introduces a ton of issues, among which:
Again, I'd advocate a step-by-step approach, we start with computers, and if we struggle to gather enough data to estimate reliable effects, then we adjust and open-up.
Maybe a 9-point Likert scale could simulate a continuous variable
Don't say that to a statistician 😁
Yes, analog scales are slightly more tedious, but IMO it's one of their strength. Beyond being a "true" continuous variables, it also avoids some response biases (in particular automated responses where people just click on pre-determined numbers, as well as response clustering where the distribution gets skewed towards some response options).
That said, I don't have a strong preference here (I just like true analog scales because it's nicer to statistically model and visualize), if the rest also think it's worth the change, we can give it a go :)
I updated the stim selection to increase the number of ero stims and decrease the neutral ones. We have now 60 stims (see here). Again, the reason for their inclusion is this:
The risk of further decreasing their number is the "inverse nice buttocks effect" whereby their salience would be inflated.
Thoughts on side measures?
Please continue arguing against/for. This dialectical process is really good :)
Hi Dominique, sorry for my silence about it -- drowning in teaching + hardcore conferencing, hope to be a tad more free next week. But luckily you've handled it splendidly! Just a few remarks:
RE: 1. Strength of Manipulation I'm actually pretty happy with these changes -- and not confident we could get MUCH MORE within the unescapable constraints of an online experiment (but probably also of some in-person experiment). With these little boost, I'm confident we will see 'something'. Might not be huge, but we don't need neither expect HUGE differences, right? So let's check this and then see what happens.
2. Multiplatform Ok, I see the problem about laptop-smartphone comparisons. TBH, I am not entirely sure that the problem you correctly identify about smartphones (e.g. getting distracted) won't apply to laptops -- personally, I am pathologically multitasking both with the smartphone and with the laptop. But I see that other problems (e.g. screen size) will matter. Hence, although recruiting participants on laptop will be slightly difficult than recruiting them on smartphones, let's begin a 1st round of recruitment with smartphones for now!
Nothing to say about 3-4, I defer to the experts. Will check 5 in a few days hopefully.
In the meantime, I'll stay updated with the others' comments!
I made a few additions:
I checked again exactly what we used in our recent FakeFace study. The goal was to add questions about people's expectations about image-generating algorithms (since we also had a cover story that we are testing an AI-image generation algorithm). But to not have these questions alone (avoid raising suspiciousness), we intermixed them with items from the GAAIS that contains general questions about attitudes towards AI.
I re-checked in details the items and also read Schepman's revalidation paper of the GAAIS. Based on their newest data, as well as on ours, I decided to revise/improve the original combo - now named for the occasion the Beliefs about Artificial Images Technology (BAIT) questionnaire. I took 6 (3 positive + 3 negative) items from the GAAIS, + 6 BAIT-proper items. These items are aimed at measuring people's expectations regarding CGI that could interfere with our experiment design. What do you think?
With these additions, the duration should now hover close to 20 min. Unless there is something important to add, I think we're almost "feature-complete". I think we're still not 100% there with regards to the scale instructions, so please don't hesitate to scratch your brains a bit more here so that we are clear with what we are trying to do
Note that the link to the experiment has changed (but always available from the README)
@MarcoViola88 @marmarini @AntonioOLR84 @AleAnsani
Hey all! Sorry for the late reply. I took some time to participate to the new version of the study. Thanks, @DominiqueMakowski , for your incredible work! I agree that we're almost there!
Here are my comments:
Here's the reference: Hatch, S. G., Esplin, C. R., Hatch, H. D., Halstead, A., Olsen, J., & Braithwaite, S. R. (2023). The consumption of pornography scale–general (COPS–G). Sexual and Relationship Therapy, 38(2), 194-218.
I think that's it. Thank you for your time and sorry again for my late reply.
@MarcoViola88 @marmarini @AntonioOLR84
But arguably yes, this is very exploratory and this item is probably not a very good proxy of such bodily state, but I just thought we could throw it in there as an optional question just to out of curiosity ☺️ we can remove it though. What does our philosopher think? @MarcoViola88
Regarding your question, I think your justification using alpha/omega holds, as it shows that you can reduce these 6 items to obtain one meaningful score. Then you can say that it was justified based on the needs and hypotheses of your study. But then maybe the editor is a GAAIS author you never know 😬
EDIT: I'm tired - it's at the end
About the COPS
I'm surprised that the minimum option is once and not zero 🤔
About cops
My thought is that when you ask "within the past year", people draw on their "semantic Self", i.e., their beliefs about themselves "in general". When you ask within the past week, it's closer to an episodic retrieval of the actual real number, but the problem is that it becomes noisy as one's behavior in the past week is not necessarily reflective of a general tendency. So maybe that's why the "within the past month" taps on a sweet spot between generalizability and noise?
Same for the duration items, I find the last one fairly irrelevant (despite its high loading) Additionally, item 1 and 2 are potentially overlapping and indiscriminate, as many people only view porn on websites. So I think here too I'd stick with the first item 🤷
Hi guys, thank you for the splendid fine-tuning :-)
Since I'm pretty convinced by your 'trimming' of the scales (& not authoritative enough when it comes to statistics, sadly), I won't anything there -- I like the idea to adopt 'slimmer' versions of several scales. (BTW, I love the idea & acronym of the 'BAIT' scale!)
But let me express my (partial) skepticism about horniness.
[Question] https://github.com/RealityBending/FictionEro/blob/d8eeb03ad7beed473bc2d004ee75a4f143b39af3/experiment/demographics.js#L202-L203) about what is the last time someone had a sexual activity (masturbation or sex), this is also related to the previous. Essentially meant as a proxy of "horniness".
Is not a good proxy of horniness. I might be a very horny person who does not engage in sexual activity (incl. masturbation) because I have no time. I might be horny and HENCE have frequent sexual intercourse; OR, I might be horny because I have not had sexual intercourse / masturbation recently. And so on. I might have had a sexual intercourse despite my lack of interest, just to please my partner (creepy as it sounds, it happens, I guess...). In sum, I am not convinced this is a good proxy. Perhaps a better way would be to ask it staright away: how "horny" do you take to be in general? I see some drawbacks of this sort of question, of course, but it seems still preferable that using such a spurious 'behavioral' proxy...
ANOTHER ISSUE JUST POPPED TO MIND: We should instruct people to run this experiment when they are alone. In fact, they can be embarassed by seeing buttocks in public -- but that's THEIR problem. OUR problem, on the other hand, is that having an audience (or not) might interfere with the fruition. One might engage less in intimate picture if a crowd watch them, I suspect. Now, having no subjects doing their eperiment from mobile phone mitigates the risk; but maybe being explicit about loneliness during completion in the instructioin (and leveraging on the fact that they're about to see some NSFW material...) is in order. Isn't it? Or we could at least/also check whether subjects were alone during the experiment with a post-hoc question.
I guess that after these latter fine-tuning we're almost done, aren't we?
I implemented most changes, added a few screens here and there to fluidify the experience, finalized the consent form & debriefing, adjusted the order of things, grouped/conditionally displayed items (e.g., birth control) etc. I am quite happy with it :)
Things of note:
Please give it a go: https://realitybending.github.io/FictionEro/experiment/english1.html I'll wait for your green light with the aim of sending the ethics application by the end of the week (I know I've said that every week but now it's real - I think)
Note: after a brief survey with native speakers, we changed the "Picture" cue in favour of "Photograph" which seems to be more "reality-loaded"
Don't hate me but after further thinking, I think it would be incomplete, especially given the presence of non-ero stims, to not have a question about emotional valence. It offers a new dimension of pleasantness/unpleasantness that could capture reactions to non-ero stims as well as ero images that one could judge as disgusting etc.
We piloted it this new version on a couple of people and the duration seems to be ~24min (probably adding the 3rd scale adds ~2min, but I think it's worth it?), but the feedback was that it's rather fun and not too long 🤷 (it's the first time I personally ever run a study this short 😁) We asked about whether the scales were well-explained and "made sense" and it seems alright. We will still run a couple of pilots just to make sure everything is alright once I have your greenlight 🚥
Hi guys, I've done a test today -- apology for postponing it so much, I was kinda overwhelmed by teaching duties until a week ago :-S My impression is overall very positive: many doubts I had reading instructions in 3rd person were solved by tackling the exp in 1st person. My hunch (or "phenomenal experience" if you feel philosophical :-P ) is that the manipulation DOES WORK. Although I knew the paradigm, I couldn't resist the impressions that some images are artificial -- and that triggered something (sometimes, uncanny feelings!). Moreover, while it complicates things a bit, the third axis (VALENCE) might capture some interesting dissociation, e.g. "it looks enticing but since you told me it's fake I'm feeling unpleasant". A few minor issues I invite @DominiqueMakowski to consider (but I won't enforce them):
Whatever Dominique decides to do with the points above, I think we are ready to move to the next step. Dominique, have you already submitted the application to the IRB? In order to begin data collection we need to decide a protocol (e.g. snowball sampling? only free participants or paid participants too? and so on). Moreover, we need to translate the experiment in Italian & Spanish -- I'm ready & wlling to that in the next few days!
PS happy new years guys!
I had reading instructions in 3rd person were solved by tackling the exp in 1st person
Nice decentering skills, we should EEG you while you adopt the two mindsets ^^ Thanks Marco for the thorough testing
We did submit a first ethics application, we hope to hear back once people are back from the break :) Once it is cleared I'll add the document to the repo and will let you know. Here, as soon as we're good to go the students will start convenience sampling to recruit as much as they can (online, free). We'll see how it goes
In the meantime, we can start the translation. For this, one needs to create a copy of this file, named e.g., instructions_italian.js
and translate the text. Then, we'll have scripts simply loading different sets of instructions in different languages (but the experiment code itself is the same so that to ensure we don't make some errors or some changes in one version that we don't port over to another version)
Hi guys! I hope the vacation lived up to expectations. In my case, that's how it was 😎. Together with Guido we translate the instructions (attached). Perhaps the most sensible adjustment would be the name of the scale "Enticing." Both Guido and I agree that the literal translation in Spanish has Catholic connotations ("fall into temptation"), which is why we consider that the label "Atractivo/apetecible" better fits what we want to measure. Let us know what you think. Un abrazo! Antonio Instrucciones Spanish.docx
the ethics has been approved 🥳 (it took more time than expected as the committee got reorganized just before Christmas).
So we will start collecting some data asap (PS: do send me your OSF accounts names if you have ones so I can add you to the OSF data repo). I made a few improvements to facilitate the tracking of the "source" of participants (and avoid needless duplication of HTML files):
the experiment now collects and saves 2 "URL variables", exp
(researcher/source) and lang
(language).
So the new URL is now https://realitybending.github.io/FictionEro/experiment/english?exp=TEST&lang=en
note the question mark ?
and then the url variables
so when we want to test the experiment, we can write TEST so we can then filter them out from the data. The link at the end of each experiment that invites participants to share it with others has exp=snow
for "snowball". If I post the link on twitter/X I could add exp=domx
If we put the experiment on some platform we will do exp=prolific
. It's basically just to keep track of where the participants come from so we can trace the collection history.
Within the next few days/weeks (I need to finish to prepare a module first), I'll set up the spanish & italian versions so that we just need to then replace the text
Hi guys, Thank you for the updates!
I'm finalizing the translation in ITA, but I'd like to ask Marco &/or Alessandro to check them before uploading it. Before the data collection, let me check the following: do we want to include someone else (as mere data collector) from other countries AND/OR provide other translations? (e.g. Dominique, do you want to include someone from your past affiliation in France/Poland/Asia?) I think this is NOT necessary, but I ask just because I suspect this is the last good moment to do so (... or not?)
BTW, here is my OSF profile: osf.io/e6js2
Hi guys,
Fantastic news about the ethics approval! Thanks for letting me know about the tweaks you made. Here's my OSF profile link: [https://osf.io/b4th6/]. Marco, I'm ready for text-reviewing duties whenever you need a hand.
Huge thanks for all your hard work. Can't wait for this data collection to kick off! Marco
let me check the following: do we want to include someone else (as mere data collector) from other countries AND/OR provide other translations?
Good point, let me drop an email to Sperduti fto see if he has some bandwidth at the moment to run a French arm :)
Hi guys, sorry I disappeared (again), I've just moved back to Jyväskylä. I tested the exp, it took me around 20 minutes. I find the whole procedure convincing, just three comments:
What do you guys think? Anyway, these are minor things, and I wouldn't object to moving on to the next phase without making these tiny adjustments.
(Sorry about any possible mistake here, it's a bit late and I'm sleepy, but I wanted to give you my take on the exp)
P.S.: My OSF profile is: osf.io/47v9u
Thanks @AleAnsani
after each image, you will have to...
part, but doing something like after each image is presented for a couple of seconds, you will have to
makes it more confusing). But while it's true that it might surprise some participants the first time, I think they'll understand that it's how it's meant to be after a couple of trials + We started collecting and we have already a couple of participants so far ^^ so let's just roll for now and see how it goes
instructions_*.js
. You can start testing the italian version, and let me know where there is remaining english so I can enable its translation :)Hi,
Hence, we are virtually almost ready to begin data collection in Italy.
Nice!!
(e.g. could we also try to get some Prolific participants? Are we aiming mainly for 20-30yrs students?)
Well afaic the only limitation for prolific is money 😁
Just for budget calculation, they say " We recommend you pay participants at least £9.00 / $12.00 per hour, while the minimum pay allowed is £6.00 / $8.00 per hour."
So for a half-an-hour experiment it would be £3-4.50, + prolific fees. I'd say £4/4.70€ (or £4.30/5€) per participant in total (including prolific's cut) is fair. So we have to budget around 50€/10 participants.
I'd like to know which criteria we should aim for (if any)
Which makes me think that we kinda-sorta forgot to preregister 😱 Either we do it now (but we say data collection is on its way, but not processed); or we preregister the non-english versions "separately"? I don't think it's a big of a deal, since our hypothese are quite obvious and clearly stated but well better to do it than not.
As for the inclusion criteria, what do you have in mind?
translation
Thanks for the detailed issue on the translation, I'll try to fix the remaining points
Hi guys, these are definetely good news. Since the "original" consent is already approved, I am going to consult with the ethics committee of my university if this approval is sufficient to apply it to the students I have. I share @MarcoViola88 observation, is it feasible to include my contact name in the Spanish version? I think this would also facilitate dissemination, since it would be easier to see that the university is part of the research. I agree 100% with @MarcoViola88 that @DominiqueMakowski tremendous effort should be reflected in the authorship of the article. I will be waiting for any questions regarding the Spanish version. Un abrazo, Antonio
include my contact name in the Spanish version
of course, feel free to remove me even if you prefer i don't mind (it's just that I need to be ther for the english version as per ethics approval but otherwise I really don't care)
Nice to meet you all, and thanks for the collaboration proposal. I've read the whole brainstorming, and I think that you've already discussed the major issues there and find the optimal solutions. Next steps for me, if I've well understood, is to create a french version. I will probably have to apply for ethical approval in my university. But I hope that they will be indulgent, since there already is an ethical approval from another university.
Hello Marco, happy to have you onboard :-) And BTW, since this brings the # of Marcos involved in this study to 3 (including myself and Marco Marini), so I propose to include last name in futher communications XD
I see from OSF that data collection is going smoothly in UK. Good to know! Let me know if we can proceed to setup the data collection strategy in Italy too (@marmarini or @AleAnsani will do another check to the Italian version of the exp to see if there are still open issues to be fixed). Have we set an ideal # of participants to collect for each country, e.g. via Power Analysis?
It'd be nice to have a chat about power analysis with @DominiqueMakowski. I don't know if we're going to go for GLMMs or go Bayesian. In the first case, power calculations are a bit blurred (in that, as far as I know, there's no consensus on the ultimate way to compute power (some indications here, here, and here); simr might be a good R tool for simulation, but again, there's no consensus). To be safe, we could just stick to what we did in Marini et al. (2024), but again, @DominiqueMakowski is the absolute master here =)
Apart from that, I started drafting an R code to merge the CSVs and do some data cleaning, but I stopped when I realized that the response variable needed some JSON manipulation (i.e., all the DVs are within the same cell, whereas I would place their values in 3 different columns). I hope I'll continue in the next few days, although idk if I'll have time enough very soon. But wouldn't it be great if we had an R script in the data folder to analyze data in real-time? (...yes, I'm teasing you all!)
P.S.: Welcome on board @marcosperduti !! We're so glad to have you here!! Best of luck with the ethical approval ;)
Best of luck with the ethical approval ;)
Thank you. BTW did you ask for ethical approval in your University? Or are you running the study under the Dominique's ethical approval?
@marcosperduti as for me, I haven't requested any approval from Jyväskylä. I don't think I'll recruit participants through their official channels, so I don't think I need it. But you raised a good point, maybe I'd better ask for this...
Need to translate these variables. Once it's done and we're ready to deploy, I will uncomment the saving of the data for the italian version and we can proceed
You can copy and paste that instructions_english.js
file, open it with a text editor (or VS code in my case, or even Rstudio), translate all the variables (beware of the html syntax that indicates the formatting). and then you paste it here in this discussion or sent it via email and I'll add it to GitHub and will create the links for experiments
I don't like power analyses, I think they rely on absurd assumptions, tend to give non-realistic estimates, and simply don't scale well with the type of analysis that we want to do (i.e., go beyond t-tests and correlations). I tend to be from the school of "collect as much as you possible and then do the right stats to reliably estimate the uncertainty and incorporate that in the interpretation and discussion". (See this and this that just came out; as well as this). BUuuut that's only my personal opinion so by all means feel free to disagree and do what you think is right 😁 Possible options I see are:
I don't know if we're going to go for GLMMs or go Bayesian
Bayesian GLMMs 🙃 (though TBF I expect Frequentist GLMMs will give the same evidence)
I see you're all hungry for some data eh ^^ I posted in analysis/
:
token=""
variable to allow it to access the storage. But this script doesn't do much but processing anyway so I wouldn't bother on trying to run it, I'll be running it every now and then to update the data and you can focus on the processing per se. It saves the data on GH as rawdata_*
(one file for the task one for the participants' info)Let's if possible let's not post the analysis scripts publically yet (because as it's the job of the supervision students I don't want them to have everything already done :) - we can discuss results and show graphs etc. but just don't reveal the code ^^
Note:
😅
- Italian version
Need to translate these variables.
Sorry, for some reason (e.g. being a total noob) I cannot edit (cf. below) Let me place the translation here, after a slash, i.e. "/":
var button_continue = "Continue" / "Continua" var button_end = "End" / "Fine"
var demographics1_preamble = "Please answer the following questions:" / "Per favore, rispondi alle seguenti domande:"
var demographics_q_sex = "What is your biological sex?" / "Qual è il tuo sesso biologico?"
var demographics_c_sex = ["Male", "Female", "Other"] / ["Maschio", "Femmina", "Altro"]
var demographics_q_edu = "What is your highest completed education level?" / "Qual è il titolo di studio più alto che hai conseguito?"
var demographics_c_edu = [
"University (doctorate)", / "Università (dottorato)",
"University (master) or equivalent", / "Università (laurea magistrale) o equivalente",
"University (bachelor) or equivalent", / "University (laurea triennale) o equivalente",
"High school or equivalent", / "Scuola superiore o equivalente",
"Primary school", / "Scuola dell'obbligo",
"Other", / "Altro",
]
var demographics_q_age = "Please enter your age (in years)" / "Per favore immetti la tua età (in anni)"
var demographics_p_age = "e.g., '31'" / "per es. 31"
var demographics_q_eth = "Please enter your ethnicity" / "Per favore immetti la tua entia"
var demographics_p_eth = "e.g., Caucasian" / "per es. Caucasica"
var demographics_q_cou = "In which country do you currently live?" / "In che paese risiedi attualmente?"
var demographics_p_cou = "e.g., UK, Spain" / "per es. Italia, UK"
var demographics_q_lang = "How would you rate your level of English?" / "Come giudicheresti il tuo livello di Inglese?"
var demographics_c_lang = ["Beginner - 0", "1", "2", "3", "4", "5", "6 - Native"] / ["Principiante - 0", "1", "2", "3", "4", "5", "6 - Madrelingua"]
var demographics_q_ai =
"How knowledgeable do you consider yourself about Artificial Intelligence (AI) technology?" / "Quanta familiarità ritieni di avere con l'Intelligenza Artificiale? (IA)"
var demographics_c_ai = ["Not at all - 0", "1", "2", "3", "4", "5", "6 - Expert"] / ["Nessuna - 0", "1", "2", "3", "4", "5", "6 - Molta"]
var demographics_hormones_preamble =
"The following question is important to understand the role of potential biological factors in our study.
It is however optional, and you can skip it if you want." / "La prossima domanda ci interessa al fine di comprendere il ruolo potenziale di fattori biologici nel nostro studio.
Si tratta però di una domanda opzionale, se vuoi puoi saltarla."
var demographics_q_hormones =
"If you are a female, are you currently using birth control treatment?" / "Se sei femmina, stai attualmente utilizzando dei trattamenti contraccettivi?"
var demographics_c_hormones = [
"No", / "No",
"Yes - contraceptive pills (combined pills)", / "Sì - pillole contraccettivi (pillole combinate)",
"Yes - contraceptive pills (progestogen-only pills)", / "Sì - pillole contraccettivi (pillole a base di solo progestinico)",
"Yes - intrauterine device (copper coil, IUD)", / "Sì - dispositivo intrauterino (spirale, coppetta intrauterina)",
"Yes - intrauterine system (IUS)", / "Sì - dispositivo intrauterino a base di ormoni (IUS)"
"Yes - female condoms", / "Sì - condom femminili",
"Yes - condoms for partner", / "Sì - condom per il partner
"Yes - other", / "Sì - altri",
... Or please kindly suggest me how to make the changes myself !
weird why it doesn't let you (the mysteries of github), anyway I updated the file, let's make sure we didn't forget any line or button
In the instructions, DM proposes the following 3 scales:
Sexy: How sexually appealing do you think the image is.
Arousal: How much you felt your body to react to the image
Convincing: In case you spot any artifacts or problems with the image that made it look fake (mostly applies to the artificial images)
Let us start with the latter. In the first implementation, Convicing is not asked to avoid triggering distrust in our manipulation. An idea might be to split the experiment in 2 tasks:
PROS: we might check whether people believed in our implementation; we might collect believe-sexyness scores.
CONS: I suspect that the 'dejavu' effect might somehow enhance credibility via familiarity (as it happens to faces: I tend to trust more those tpeople I saw before)
Sexy: How sexy this image would be perceived for an average viewer of the congruent sexual orientation. Focus on the third person.
Arousal: How much you felt your own body to react to the image. Focus on the first person.