Conceptualization Discussion

RealityBending / FictionEro

https://realitybending.github.io/FictionEro/

3 stars 3 forks source link

Conceptualization Discussion #2

Closed MarcoViola88 closed 5 months ago

MarcoViola88 commented 1 year ago

In the instructions, DM proposes the following 3 scales:

Sexy: How sexually appealing do you think the image is.
Arousal: How much you felt your body to react to the image
Convincing: In case you spot any artifacts or problems with the image that made it look fake (mostly applies to the artificial images)
- *

Let us start with the latter. In the first implementation, Convicing is not asked to avoid triggering distrust in our manipulation. An idea might be to split the experiment in 2 tasks:

first, a "how sexy/arousing" rating task with the complete stimuli set
second, a "how convincing" task with only the stimuli we presented as AI-generated

PROS: we might check whether people believed in our implementation; we might collect believe-sexyness scores.
CONS: I suspect that the 'dejavu' effect might somehow enhance credibility via familiarity (as it happens to faces: I tend to trust more those tpeople I saw before)
- * Now, let's consisder Sexy / Arousing. Do we want 2 separate measures rather than a single, undifferentiated "sexual arousal" measure? On the one hand, having 2 ratings can help disentangling 1st person/3rd person. They also somehow align with the literature on AI-generated arts, where experimenters often ask both "how beautiful is it?" & "how much YOU like it?". Moreover, it could be interesting to analyze whether the two correlates or not. Not to mention that arousal measures can be directly compared with normative ratings from the EroNAPS. On the other hand, they might slow down the experiment a bit. This might not be necessary if we cut down the stimuli a bit (cf. the other issue). I am more concerned about the fact that, as they are currently presented, the instructions may under-constrain the construct we want to measure. To put it simply, I suspect that if we ask a single self-report measure people will somehow "put everything" in there, whereas if we give two measures they could arbitrarily interpret them, and each one would be reporting something different (e.g. how sexually appealing for me VS how sexually appealing for people in general VS how sexually appealing for people of your gender and sexual orientation etc.). Of course, this worry can be mitigated by more specific phrasing. Two sketches below:
Sexy: How sexy this image would be perceived for an average viewer of the congruent sexual orientation. Focus on the third person.
Arousal: How much you felt your own body to react to the image. Focus on the first person.

DominiqueMakowski commented 1 year ago

Realness

Fair points. We could put the convincing/realness scale on the first viewing only in the fiction condition to avoid the double presentation, what do you think? That'd mitigate the overall suspicion issue.

sexy/arousing

I really like how you framed that and indeed I didn't think of it in terms of 1st/3rd person + aesthetic studies.

a single self-report measure people will somehow "put everything" in there

I feel like if we want this study to be a step forward from our previous work we should aim at being a bit more refined in terms of our exploration of the psychological mechanisms hypothesized to be at stake. Like in this case, having 2 variables would make the data more rich (we can average them - if they are very correlated - to get an even more robust general measure; and analyze them separately - which I'm optimistic would yield some interesting dissociation). We could potentially explore the idea that the effect of fiction on emotion is related to a lowering of "Self" engagement, and thus would be more marked in the subjective scale than in the more objective one.

they might slow down the experiment a bit

I takes <1.5 second to answer to a scale once participants are familiar with the instructions, so that's like + <2min total which I think is worth it.

if we give two measures they could arbitrarily interpret them, and each one would be reporting something different

Agreed, we need to work on the phrasing here

"Focus on the first/third person" is a bit abstract (at least to me ^^), maybe something like:

Arousal
- Maybe we should ask this scale first, it's easier to then dissociate the 3rd person
- Instructions: How much do you find the image arousing. Did you feel a reaction in your body (whether positive or negative) _{This I'm not sure but in theory arousal is explicitly dissociated from valence, but since we don't ask about
  valence IDK}? This question is about your own personal reaction.
- Prompt (in the task) How much did you feel a reaction to the image in your body?
Sexy:
- There is an implicit (IMO useful) distinction that we create by using the terms "feel" above and "think" below.
- Instructions: This question is about how sexually appealing the image is. In other words, how sexy do you think the image would be perceived by an average viewer similar to you in terms of gender and sexual orientation?
- Prompt: Do you think this image would be considered sexy by others?

Let's continue refining :)

MarcoViola88 commented 1 year ago

You totally convinced me about the 2 stimuli, thank you.

Realness Fair points. We could put the convincing/realness scale on the first viewing only in the fiction condition to avoid the double presentation, what do you think? That'd mitigate the overall suspicion issue.

I'm still a bit concerned of asking realness together with the other 2 ratings, even if only for AI-generated stimuli. There are 2 reasons: (i) isn't that a bit hard to judge in just 2-3 secs? (ii) wouldn't asking that before / after asking to rate sexiness and arousal be influenced / influence the process of expliciting and conceptualising how arousing & how sexy we find the stimulus? That being said, when I say that I am "a bit concerned", the "bit" is not rhetoric: my ideas are not 100% clear about this, so I am open to arguments (or even to defer to the judgments of whowever has clearer idea in mind).

DominiqueMakowski commented 1 year ago

wouldn't asking that before / after asking to rate sexiness and arousal be influenced / influence the process

Yes you're right about the fact that it potentially creates a confound...

Mmh

Okay then, maybe we re-lower slightly the number of stims (not 100% sure that's necessary, we can try first with the 80 and see how much time it adds up) and, indeed, have a second run at the end where we show (all?) the pictures and we say "In this second phase, we are evaluating the quality of our algorithm, and would like to ask you whether you notice any issues or problems with the images presented to you in the previous phase." and have a rating of like "Obviously fake - Very realistic".

My hope is that we would expect: 1) overall, most (fiction-condition) rated as very realistic, and if not we can use that data to filter out some items. 2) a possible lingering difference between items presented as real and fake, the latter being rated slightly less realistic than the ones presented as photos. This would show a deep effect of our manipulation that affects (from memory) the re-exposure

One issue is that one could argue that then the ratings of arousal are influencing the posteriori ratings of reality, but that's fine as 1) the former is our primary target of interest and 2) we can statistically check the effect of arousal on reality beliefs and its interaction with the condition. 3) if that proves to be a big issue we can always in the future run a subsample with the order of the tasks inversed and see

MarcoViola88 commented 1 year ago

Arousal Maybe we should ask this scale first, it's easier to then dissociate the 3rd person Sounds reasonable, right.

Instructions: How much do you find the image arousing. Did you feel a reaction in your body (whether positive or negative) This I'm not sure but in theory arousal is explicitly dissociated from valence, but since we don't ask about valence IDK? This question is about your own personal reaction. MMM, I would avoid getting in trouble by opening the pandora's box of valence, i.e. I'd avoid mentioning positive/negative.

Prompt (in the task) How much did you feel a reaction to the image in your body?

Sexy: There is an implicit (IMO useful) distinction that we create by using the terms "feel" above and "think" below. Instructions: This question is about how sexually appealing the image is. In other words, how sexy do you think the image would be perceived by an average viewer similar to you in terms of gender and sexual orientation? Cool, I agree that feeling/thinking would do a good job subliminally.

Prompt: Do you think this image would be considered sexy by others?

Despite having proposed the notion of "average viewer similar to you in terms of gender and sexual orientation" myself, I am not 100% sure we should keep it. I suspect that keeping it = improving accuracy. But refraining from specying who the average observer is (i.e. a samely sexually-oriented one) we would probably get a stronger dissociation.

I stop here for now so to allow other to express their thoughts!

AleAnsani commented 1 year ago

Hey guys, I read almost everything and I'm ok with the direction the study is taking (ok for the 4x2 paradigm!). Here are a bunch of things I'd like to say:

1 - About the individual traits scales (GAAIS, etc.): I bumped into this very interesting preprint, whose name is encouraging: "Can we assess attitudes toward AI with single items?" Apparently, the authors are confident we can. I see the importance of replicating the GAAIS (non)results, but I also see the problem of increasing the duration of the exp. My golden rule would be: anything < 20 minutes is ok (see also Revilla & Höhne, 2020).

2 - I like the fact that the image remains on the screen for a couple of seconds. It's different from our previous work, but I find it crucial, especially if we have images of couples, which are less likely to be perceived as AI-generated. One thing I'd slightly change is the presentation; I'd say: 1: AI-generated/Real > 2: the photo appears (but AI-generated/Real remains on the screen with the photo) > 3: everything disappears

3 - I see and appreciate Dominique's bottom-up logic saying: if the two measures are correlated, we average them; if they're not, even better: we have two potentially different measures. However, and I'd like to ask this to our philosophers here: let's assume that we average the two measures: what are we assessing, exactly? Can we think about some underlying factor?

4 - As for the main measures; I like the 1st/3rd person phrasing and the feel/think differentiation. However, I would avoid lengthy prompts and especially the "body" word. I suspect youngsters feel the body less, and/or are less prone to say that something was moving in their body (sounds a bit pervy?). This would lead to a a potential floor effect in the arousal measure and thus a decrease in the correlation of the two measures.

I'd go for something like:

Arousal How much did you feel sexually excited by this image?

Sexy Do you think others* would consider this image as sexy?

*Other has some issue too: who are the others? What are their sexual orientations? Here's two alternatives: How much do you think that other people would consider this image as sexy, on average? (wordy) How sexy do you think others would find this image, on average?

5 - I love Dominique's idea about the "we are evaluating the quality of our algorithm" thing!

Hopefully, tomorrow I'll read everything again and come up with new suggestions. @DominiqueMakowski when's your deadline?

Bisous, besos, and baci =)

Revilla, M., & Höhne, J. K. (2020). How long do respondents think online surveys should be? New evidence from two online panels in Germany. International Journal of Market Research, 62(5), 538–545. https://doi.org/10.1177/1470785320943049

DominiqueMakowski commented 1 year ago

Thanks @AleAnsani great comments ☺️

I made some changes:

Cut the number of stimuli to 54
(Slightly) rephrased some of the descriptions
Added a Phase 2 with the realness check
Added a debriefing to check their suspiciousness (with different options so that it's we don't raise suspicious merely by asking)

Can you give it a try and let me know how long does it take currently, so that we have a better idea. Note: if you want to check specific things, you can skip the image & fixation cross by pressing "S"

I'd still push back on having overlap text + image:

1) although we're not doing neuroimaging here or eyetracking (in which case it would be problematic as we don't know whether they are reading or watching) I'd still prefer to have them separate to have theoretically disentangled processes 2) I have trouble seeing the benefits of having the text on, as a "reminder" I suppose but I really doubt that they'll forget the cue. I'd even say it could add some noise: what (I think) we (at least me ^^) are trying to see is the effect of apriori beliefs of fakeness (expectations, or "priors" in a Bayesian sense). Hence the cue before that primes the participants. But if we have it overlayed, we have some weird interaction that can take (more easily) place: people can look at the image, the re-read the text, then potentially feel conflicted, then re-look the image and try to solve their mental conflict etc. etc. What I'm trying to say is that if we have the image + the text we know even less what participants are doing, which is not paradigmatically optimal

MarcoViola88 commented 1 year ago

Hi guys, thanks everybody for the intense brainstorming, and double thank to Dominique for trying to come up with some synthesis & implementation! Given that we need to fixate something, I'd try to aim at consensus. This morning I tested the exp. After 2-min with mobiles, I restarted with laptop and completed it in 14mins -- which should be fine. Here are some observations: -- On mobile, I'd add some border on the left & right of the informed consent to improve readibility. -- In the instructions, there is still "Picture" (whereas the cue for real images is "Photo"), so it should probably be changed. -- also in the instruction, in "high quality erotic (but also non erotic content)", the bracket should be probably closed a little earlier -- when asking ethnicity, I would use a fixed-choice menu rather than allowing inputs. But maybe it's even better to ask for countries rather than ethnicity -- we should probably ask some info for sezual orientation (either via inventories or with a single question?) -- based on Dominique's reflection that people would probably feel easier to start from their subjective feeling before providing an abstract question, I agree that we should probably invert the order of SEXY / AROUSAL, both in the intsructions and in the task. -- when we ask about arousal, I am happy not to mention the body. However, since the "average sexyness" is ABOUT the image, I would also specify, for the sexual arousal question, that our question is about the image (in Russell's theory terms, we want to measure Perception of Affective Quality, not Core Affect). Hence, instead of "How much do you feel sexually aroused" I propose "How much do you feel sexually aroused by this image".

Concerning the "keep the cue" issue, I am conflicted. On the one hand, I can see Dominique's concerns that keeping the cue alongside the image as proposed by Alessandro might distract people (are they looking at the image or at the cue?) and ultimately elicit different processes. On the other hand, having performed the task, which is rather repetitive, I noticed that I hardly paid the cue enough attetion. However, here is a way out I thought about (provided it can be coded without excessive effort): we should probably give a CUE before and then reinforce the manipulation AFTER, i.e. during the instructions. In other words, rather than "How much would you consider this image to be sexy on average" / "How much do you feel sexually aroused by this image", I propose we ask: "How much would you consider this PHOTO to be sexy on average" / "How much do you feel sexually aroused by this PHOTO" or "How much would you consider this AI-GENERATED IMAGE to be sexy on average" / "How much do you feel sexually aroused by this AI-GENERATED IMAGE". Maybe the caption is too much, but I would reinforce the manipulation, which otherwise risks to be a little too mild -- a concern coming from the small results we found in the (Marini et al.) paper currently under review

MarcoViola88 commented 1 year ago

In any case, giving that Dominique is putting a lot of effort and thought into it, I think it's good that we discuss, but when it comes up to deciding what to implement I trust his judgments -- and in worst case scenario we could switch the variables at some points (provided we stay sufficiently vague during the IRB/preregistration stage, that shouldn't be impossible, right?).

AleAnsani commented 1 year ago

Hello everyone! Sorry for taking so long. I've tried the exp, it took me 16 minutes, which is fine. Here are my considerations:

I fear the manipulation might be a bit weak. Maybe it's just me, but after a bunch of pictures, due to time pressure, I was so concentrated on the actual images that I completely forgot about the labels that preceded them. I didn't really pay attention to them, I just experienced them as some sort of countdown. We all know that priming work even subconsciously, but my guess would still be that if we want to keep the labels as primes (@DominiqueMakowski 's idea of priors arouses me more than the pics!), we should leave them on screen for slightly longer. It makes sense to make them disappear when the image appears, but I fear that, as it is now, they won't work perfectly as proper primes. (not sure about @MarcoViola88 's idea of repeating AI-generated or Picture in each image's prompt; maybe it can strengthen the manipulation...even though it's wordy)
Too many images are completely unrelated to sex. I felt this as a bit awkward. My naive feeling was: "You're asking me about sexual arousal, why the hell am I seeing elderly couples hand in hand?! They're so tender they're actually the opposite of a sexual stimulus". I think it's ok to have some kind of baseline for the arousal, but now the DV(s) would be so (unnecessarily?) skewed! It can or cannot be a problem, but I would maybe refrain from putting the participants in some cognitive dissonance-like state.
In the screen where we explain the task, for the sexy variable we say "this question is about how sexually appealing you believe this image would in itself be to people similar to you in terms of gender and sexual orientation". Just a thought on this: such phrasing will eventually lead to a correlation between the two measures. However, if we want to capture a more general difference between subjective / objective measures (or 1st / 3rd person), we should probably frame the prompt in a more general fashion, such as: "this question is about how sexually appealing you believe this image would be for an average person". For me, it would be nice to capture that maybe one deems a pic to be very arousing for others, but not for themselves, or vice versa. Does it make sense?

These are of course points of discussion. But @DominiqueMakowski , please feel free to ignore them if your time is running out.

DominiqueMakowski commented 1 year ago

All good, I prefer to take a few days more and have something solid than rushing things out with major flaws. thanks a lot for your time and input, I'll read that again tomorrow and will propose some further changes and will let you know :)

MarcoViola88 commented 1 year ago

Some quick comments on @AleAnsani's points:

I also fear the manipulation could be weak. Some non-mutually exclusive options to reinforce it might be: 1a. Clustering photos and AI-generated images in 2 big blocks, each introduced by a cover-up story; or maybe in 4 small blocks (to avoid habituation). 1b. As mentioned, reproposing the manipulation during the TASK INSTRUCTIONS. I disagree with @AleAnsani about wordiness: we only need to switch from "How much would you consider this image to be sexy on average?" to "How much would you consider this PHOTO/AI-GENERATED IMAGE to be sexy on average?", i.e. +0 / +1 words respectively! 1c. Watermaks? But I am not very convinced about them -- they might best be used for the authenticity manipulation, as @DominiqueMakowski convincigly argued 1d. Slighlty longer presentation for the cue, as suggested by DM... 1e. ... but also, what about using different colors for AI-GENERATED & PHOTO? Or maybe a iconic logo together with/instead of the image, e.g. a photo machine & a stylised neural network? They might be more powerful as subilimal primes
I also felt a little bit of cognitive dissonance when presented with sensual images of horny women right after a picture of elderly couples XD that's my main reason for proposing to stuggest that the arousal rating task need by about the stimulus, not about one's body, since a body-cnetered self-report might be more influecned by carryover effects. Yet, I have no clear suggestion as of how to avoid this issue. That being said, I am not entirely sure we should remove the stimuli. I suspect that's mostly a choice based on expected statistical technicalities, hence I am glad to ultimately defer on the experts here.
I made a similar point in favor of "average person", although it risks to be interepreted sometimes as "average person, irrespective of gender & orientation" and sometimes as "average person of my gender and orientation". But there is no free lunch...

That being said, I am happy to discuss whatever further issue, but equally happy to trust @DominiqueMakowski for whichever choice he could make based upon this discussion!

DominiqueMakowski commented 1 year ago

Duration

Okay so it seems like we'er currently ~15min with a 2 phase expe with 54 stims. I think that's very good, we can easily throw in 1-2 questionnaires.

Cue

If we observe a small effect (or an absence) we can always try to make the manipulation stronger by going to the next level and crafting custom implicitly-priming descriptions or watermarks. It can be a natural follow-up to also see the difference between explicit dichotomous priming and something more "ecological" and realistic

different colors for AI-GENERATED & PHOTO

Good idea, will try that (the colors will be assigned randomly between like, blue, green and red for each participants)

Or maybe a iconic logo

This makes it hard to counterbalance/randomly assign, opening us for criticism about potential confounds

we should leave them on screen for slightly longer.

Okay will do.

Clustering photos and AI-generated images in 2 big blocks

Me no-likey block paradigms 😬

(@DominiqueMakowski 's idea of priors arouses me more than the pics!)

lmao

Too many images are completely unrelated to sex

Mmh that's an interesting one.

Removing them: I can see the argument in favour, and the sentiment that you expressed about the cognitive dissonance. But again, I've been so used to seeing most emotional-stimuli-using studies using neutral as a control condition for the emotional categories that removing them makes me feel weird. I'd say let's take it step by step: we try first with and then see how it goes.
Lowering the number of neutral stims: I'm worried this would make them even more salient and thus "weird"? Maybe I can try to reduce just slightly, I'll tinker with the numbers and see.
One possible solution is to improve the general instructions? E.g., emphasizing that we present ero and non-ero stims and to somehow suggest that "it's normal to be not aroused for some of the pics".
Maybe this problem could be solved by adding a scale of "This image made you happ" (as in on a psychological, not-necessarily sexual, level) / "I found this image wholesome". So that these images would find their "purpose" with regards to the ratings and would not be thus perceived as odd ones given the task-demand characteristics

"By an Average person"

"although it risks to be interepreted sometimes as "average person, irrespective of gender & orientation" and sometimes as "average person of my gender and orientation"

Fair point...

For me, it would be nice to capture that maybe one deems a pic to be very arousing for others, but not for themselves, or vice versa

One could argue the idea that we perhaps have internalized that "you always find someone attracted to the weirdest thing". Thus, in many cases, answering "someone else could find this arousing" would not be a genuine judgment, but just reflecting of this belief about the variability of sexual tastes.
Another argument is that "average person, irrespective of gender & orientation" doesn't exist. As in you can't really average the sexual ratings of all people for a specific given thing, there's just too strong grouping effects (like a very sexy girl image, that would be rated as 10 by straight guys, could be rated as -10 by women. And the conclusion that "this image is average" is not true ). And thus we're asking the participants to answer about the average of potatoes and carrots, which puts them in a difficult position.
Another argument is that while people can arguably meaningfully generalize to their "group" (with all the caveats), it's delicate to ask them to generalize to everybody, and thus the measure loses meaningfulness.
About artificially forcing the correlation: it's true, specifying the same group will likely increase the correlation. But we could think of it as a gradient ranging from "embodied 1st person" to "disembodied / conceptual 3rd person". And in the middle, would be the "on average from someone of the same group as you", which in this case is a more meaningful level by which to look at it?

Thanks so much for the comments it's really cool to brainstorm this I feel like we're basically doing the job of future reviewers to ourselves and making our decisions fully justified and thought-of!

Keep 'em thoughts coming! we still have some time ☺️

AntonioOLR84 commented 1 year ago

Hi guys! I feel like I'm in uncharted territory, but "me quito el sombrero" (Spanish phrase) to the level of this discussion. I just carried out the experiment (16 minutes, which is nice), so here are some first impressions:

-I wouldn't mention that the study takes 30 minutes on the first screen. In the age of tik tok, I´m afraid that 30 minutes can freak out more than one participant. If we do mention the duration, I would be inclined to say 15 minutes. -I agree with @MarcoViola88, I would mention country instead of ethnicity. -I definitely share the opinion about the weakness of manipulation. After a few trials it was easy to ignore the words and base answers entirely on the images. Or at least that is the subjective experience. As the manipulation currently stands, it could function as a variant of suboptimal priming. In that case, it may be convenient to control the affective dimensions (valence) of the cue words. -Maybe it's me, but after a few trials it was evident that my response strategy was to first evaluate my personal rating (arousal) and based on this rating I tried to estimate the first scale (average sexy). I think that the content of the stimuli itself (sexy images) can facilitate this response pattern. Along these lines, I am not convinced by the term "average" for the objective scale. @AleAnsani Do you remeber how we framed the attractiveness question in the andro study? I think we faced a similar dilemma.

Un abrazo!

DominiqueMakowski commented 1 year ago

@marmarini said:

The most prominent issue I encountered during the task is the potential for participants, myself included, to forget whether an image is real or AI-generated. This is a significant concern as it may undermine the results of the experimental paradigm. Participants could lose track of the nature of the images, potentially compromising the validity of the data. Could we consider displaying this information prominently above each image to mitigate this forgetting?
I found the task to be lengthy and fatiguing, particularly in relation to slider usage. I ended up rapidly moving the sliders resulting in overlooking specific values on the scale. An alternative could be the use of a Likert scale, where participants simply click on the desired value. Maybe a 9-point Likert scale could simulate a continuous variable, potentially simplifying the participant experience.
Another significant concern is the potential imbalance between erotic and non-erotic stimuli. A scarcity of arousing stimuli might lead participants to react more strongly to the limited instances of arousing content, introducing bias into the responses. This phenomenon, which I would term "the nice buttocks effect," could lead participants to overlook information regarding the nature of the image (real or AI-generated) when encountering appealing content amidst a sea of uninteresting pictures. I mean, buttcheeks are extremely salient.
Is the paradigm compatible with mobile devices and smartphones? Ensuring compatibility with these platforms would be essential for participant engagement and the practicality of conducting the experiment in various contexts, especially given its perceived length for an online experiment.

marmarini commented 1 year ago

I tried to complete the experiment using a mobile device and encountered significant difficulties, particularly with slider movement and image formatting. The images were often too large for the screen, making it challenging to use the slider without zooming in. Probably, I would ensure compatibility with mobile devices, considering the prevalence of their use in online studies.

DominiqueMakowski commented 12 months ago

@MarcoViola88 @marmarini @AleAnsani @AntonioOLR84

1. Strength of Manipulation

[x] I added random color-association (green, blue, red).
[x] I have increased the duration of the cue
[x] Change "photo" for "picture"
[ ] Should we change the "Picture" cue to "Photograph"? I can ask english natives what they think is the most "real"

I hear the concerns that the manipulation is not strong enough and people will "not read" / "forget" the prime and there won't be any effect, but I am fairly confident that even with this weak form of manipulation it will work (we have done it in the past with even less strong type of cueing (Sperduti 2016) and it worked quite nicely. I think the sentiment might be also confounded by the fact that we as experimenters know the true nature of the stim and thus we are not engaged in the task in the same way as participants would be.

I think the only way to definitely answer that is to run the study on a "preliminary" sample and see. We can start with the UK sample while we translate the experiment, then quickly check, if it works good we deploy at full scale if it doesn't we revise.

2. Multiplatform

To be honest, I would start with restricting the experimental condition to computers. I can see the value of going multiplatform to maximize the number of participants, but it introduces a ton of issues, among which:

People doing the experiment on their phone are arguably not in the same condition as the ones doing their on their computers. They could be communicating, walking, multitasking (notifications popping up during experiment) and whatnot.
Screen size: we work with visual stimuli, and it's quite different to see an image on a computer screen than a smartphone (that's why we collect a proxy of screen size to eventually control for it). Beyong the images, the experience (even improved to fix the current problems with the scaling etc.) would be different.
We would need to do some analyses to compare between platforms, which is doable. The main feasibility problem is to make it work on mobile, which is feasible, with some work. But is it worth the trouble? We open ourselves to a wave of potential concerns, more noise in the data (esp. from mobile users').

Again, I'd advocate a step-by-step approach, we start with computers, and if we struggle to gather enough data to estimate reliable effects, then we adjust and open-up.

3. Analog/Likert scales

Maybe a 9-point Likert scale could simulate a continuous variable

Don't say that to a statistician 😁

Yes, analog scales are slightly more tedious, but IMO it's one of their strength. Beyond being a "true" continuous variables, it also avoids some response biases (in particular automated responses where people just click on pre-determined numbers, as well as response clustering where the distribution gets skewed towards some response options).

That said, I don't have a strong preference here (I just like true analog scales because it's nicer to statistically model and visualize), if the rest also think it's worth the change, we can give it a go :)

4. Neutral images

I updated the stim selection to increase the number of ero stims and decrease the neutral ones. We have now 60 stims (see here). Again, the reason for their inclusion is this:

(Most salient for men) without them (grey and brown stims) we have multimodal distributions. Having a more continuous multidimensional space makes it easier to model them.
It captures spaces otherwise unexplored (e.g., high valence for women)

The risk of further decreasing their number is the "inverse nice buttocks effect" whereby their salience would be inflated.

5. Side questionnaires

Thoughts on side measures?

Please continue arguing against/for. This dialectical process is really good :)

MarcoViola88 commented 11 months ago

Hi Dominique, sorry for my silence about it -- drowning in teaching + hardcore conferencing, hope to be a tad more free next week. But luckily you've handled it splendidly! Just a few remarks:

RE: 1. Strength of Manipulation I'm actually pretty happy with these changes -- and not confident we could get MUCH MORE within the unescapable constraints of an online experiment (but probably also of some in-person experiment). With these little boost, I'm confident we will see 'something'. Might not be huge, but we don't need neither expect HUGE differences, right? So let's check this and then see what happens.

2. Multiplatform Ok, I see the problem about laptop-smartphone comparisons. TBH, I am not entirely sure that the problem you correctly identify about smartphones (e.g. getting distracted) won't apply to laptops -- personally, I am pathologically multitasking both with the smartphone and with the laptop. But I see that other problems (e.g. screen size) will matter. Hence, although recruiting participants on laptop will be slightly difficult than recruiting them on smartphones, let's begin a 1st round of recruitment with smartphones for now!

Nothing to say about 3-4, I defer to the experts. Will check 5 in a few days hopefully.

In the meantime, I'll stay updated with the others' comments!

DominiqueMakowski commented 11 months ago

I made a few additions:

Demographics

Question about contraceptive pills: this is to suit the research project of @naomirajagukguk who investigates whether the birth-controlled induced hormonal changes lead to differences in arousal (and potential interaction with the effect of fiction)
Question about what is the last time someone had a sexual activity (masturbation or sex), this is also related to the previous. Essentially meant as a proxy of "horniness". But maybe there is a better way to phrase it?
Question about porn exposure. Again, what's the best way to phrase that?
Question about how knowledgeable in AI the participant considers to be.

"Control" questionnaire(s)

I checked again exactly what we used in our recent FakeFace study. The goal was to add questions about people's expectations about image-generating algorithms (since we also had a cover story that we are testing an AI-image generation algorithm). But to not have these questions alone (avoid raising suspiciousness), we intermixed them with items from the GAAIS that contains general questions about attitudes towards AI.

I re-checked in details the items and also read Schepman's revalidation paper of the GAAIS. Based on their newest data, as well as on ours, I decided to revise/improve the original combo - now named for the occasion the Beliefs about Artificial Images Technology (BAIT) questionnaire. I took 6 (3 positive + 3 negative) items from the GAAIS, + 6 BAIT-proper items. These items are aimed at measuring people's expectations regarding CGI that could interfere with our experiment design. What do you think?

With these additions, the duration should now hover close to 20 min. Unless there is something important to add, I think we're almost "feature-complete". I think we're still not 100% there with regards to the scale instructions, so please don't hesitate to scratch your brains a bit more here so that we are clear with what we are trying to do

Note that the link to the experiment has changed (but always available from the README)

@MarcoViola88 @marmarini @AntonioOLR84 @AleAnsani

AleAnsani commented 11 months ago

Hey all! Sorry for the late reply. I took some time to participate to the new version of the study. Thanks, @DominiqueMakowski , for your incredible work! I agree that we're almost there!

Here are my comments:

I definitely noticed the increase in length (it took me more than 20 min), and I had the impression that now we have too many pics. As I was saying, I'm just worried about the reliability of long-lasting exps, but I trust you all on that.
The strength of manipulation is greater now. I'd just increase the font size of AI-generated and Picture (also, it's great that you randomized the colors!).
MAJOR POINT: I fear that the question about the birth control pill, which is very intimate, and the whole demographics section, placed at the beginning of the experiment, could discourage many participants to go on with it. I would put the whole demographics section at the end. Moreover, I would also probably add an introductory statement to the more intimate section stating something like: "The following questions are really important to us; however, they are not mandatory. So please feel free to skip them if you feel uncomfortable answering.". In this way, if they don't feel like it, instead of dropping out, they can just go on, and we'll have the chance to still keep their data about the task. Very minor point: can we make an IF statement or a branch so that we don't ask the question about birth control to male participants?
Questions on porn consumption / masturbation: why are they open-ended? Can we take some items from this scale?

Here's the reference: Hatch, S. G., Esplin, C. R., Hatch, H. D., Halstead, A., Olsen, J., & Braithwaite, S. R. (2023). The consumption of pornography scale–general (COPS–G). Sexual and Relationship Therapy, 38(2), 194-218.

Last time sexual activity: what's the rationale for that? If we have some, it's fine; otherwise I would remove it. In general, I would remove any item that unnecessarily increases potential discomfort and increases dropout rate/missing data.
@DominiqueMakowski out of curiosity: how did you select the items of the GAAIS scale? Have you adopted a criterion which is justifiable in the paper? (I mention this because the Editor of our current article complained about the fact that we didn't use the full GAAIS but just 6 items. We mentioned that Cronbach's α and McDonald's ω were both > .82, but she still required us to state that a reduction in the items could be problematic).
Phrasing of the prompt: I'm getting convinced more and more that whichever phrasing we'll use, the two measures will be highly correlated, so I don't find compelling reasons to change it, tbh.
Randomization: I don't know if this has already been done, and I know that with GLMMs (participants and stimuli as random intercepts and slopes) this is not so much of a problem, but I still believe that we should make sure that the pictures depicting women and men should be equally distributed between the AI and Picture categories. We don't want an AI category with just 20% of pictures depicting women and a Picture category with 60% of women picture. Again, idk if this was already done or not; so sorry if I'm just adding unnecessary concerns.

I think that's it. Thank you for your time and sorry again for my late reply.

@MarcoViola88 @marmarini @AntonioOLR84

DominiqueMakowski commented 11 months ago

in case we slightly overshoot 20min, let's keep in mind that clearly our experiment is much more engaging than typical cognitive tasks 😁 Pretty sure many participants would like it to last even longer 👀 But to reassure you, for what it's worth, today's participant (with the latest version - probably you?) did it in ~18min
Will increase the size
Good point! We'll put the demographics at the end.
I'll try to have the birth control question display conditionally
porn consumption: Good idea: I'll add thee 6 items of the Frequency and Duration subdimensions (because otherwise it's a bit much and we don't really care about the rest I think)
Last time sexual activity: well, we could make interesting hypotheses about the influence of the bodily state on the effect of fiction:
- [ ] "Horniness makes us like behave like animals" hypothesis: being horny would increase the arousal ratings (main effect), and would also skew the judgments and attention to the "erotic" value of things, and away from their nature, thus lowering the difference between real and AI
- [ ] "Horniness increases saliency of relevant stimuli": An Evolutionary hypothesis: being horny would actually maximize the arousal towards "real" stimuli - because they are relevant, and minimize the arousal towards non-relevant & unreal stims, thus increasing the difference between real and AI

But arguably yes, this is very exploratory and this item is probably not a very good proxy of such bodily state, but I just thought we could throw it in there as an optional question just to out of curiosity ☺️ we can remove it though. What does our philosopher think? @MarcoViola88

I think originally we picked the ones with the highest loadings (that's my current Self recall), but when I looked at the re-validation they are definitely not the ones. But in any case for us we didn't really care about the GAAIS, we mostly picked relevant items from it to mix with our target items about CGI. In the current version I picked the items with the highest loadings that are relevant and removed the ones with the low loadings (to be honest I found some "high-loadings" items a bit off). Happy to change that if others have better ideas... The way I'm thinking to go about it here is to say basically that we are kind of creating a new scale (BAIT), with some items strongly selected and adapted from an existing scale (GAAIS) to make the scale's primary goal less overt. If we have enough participants - which we should - and especially if we have multi-language data, we can do a small factor structure validation and put in in the appendix, and if it has some influence on the effect of fiction it would be a good début of evidence for its validity.

Regarding your question, I think your justification using alpha/omega holds, as it shows that you can reduce these 6 items to obtain one meaningful score. Then you can say that it was justified based on the needs and hypotheses of your study. _{But then maybe the editor is a GAAIS author you never know 😬}

We should organize bets about how correlation do we all predict between the two scales 💰 💰
Yes, the conditions are assigned within each category so there should be an equal amount of fiction/real for each subcategory

DominiqueMakowski commented 11 months ago

EDIT: I'm tired - it's at the end

About the COPS

I'm surprised that the minimum option is once and not zero 🤔

DominiqueMakowski commented 11 months ago

About cops

There don't mention the instructions in the paper, so I had to come up with something, what do you think?
Asking for the frequency for the past 7 days, month and year seems redundant. Based on their validation the Month (and week) seems the most loaded. Should we drop the year (and the week) question?

My thought is that when you ask "within the past year", people draw on their "semantic Self", i.e., their beliefs about themselves "in general". When you ask within the past week, it's closer to an episodic retrieval of the actual real number, but the problem is that it becomes noisy as one's behavior in the past week is not necessarily reflective of a general tendency. So maybe that's why the "within the past month" taps on a sweet spot between generalizability and noise?

DominiqueMakowski commented 11 months ago

Same for the duration items, I find the last one fairly irrelevant (despite its high loading) Additionally, item 1 and 2 are potentially overlapping and indiscriminate, as many people only view porn on websites. So I think here too I'd stick with the first item 🤷

MarcoViola88 commented 11 months ago

Hi guys, thank you for the splendid fine-tuning :-)

Since I'm pretty convinced by your 'trimming' of the scales (& not authoritative enough when it comes to statistics, sadly), I won't anything there -- I like the idea to adopt 'slimmer' versions of several scales. (BTW, I love the idea & acronym of the 'BAIT' scale!)

But let me express my (partial) skepticism about horniness.

On the one hand, I agree that a good proxy of horniness would reveal interesting correlations.
On the other hand, I suspect that the current

[Question] https://github.com/RealityBending/FictionEro/blob/d8eeb03ad7beed473bc2d004ee75a4f143b39af3/experiment/demographics.js#L202-L203) about what is the last time someone had a sexual activity (masturbation or sex), this is also related to the previous. Essentially meant as a proxy of "horniness".

Is not a good proxy of horniness. I might be a very horny person who does not engage in sexual activity (incl. masturbation) because I have no time. I might be horny and HENCE have frequent sexual intercourse; OR, I might be horny because I have not had sexual intercourse / masturbation recently. And so on. I might have had a sexual intercourse despite my lack of interest, just to please my partner (creepy as it sounds, it happens, I guess...). In sum, I am not convinced this is a good proxy. Perhaps a better way would be to ask it staright away: how "horny" do you take to be in general? I see some drawbacks of this sort of question, of course, but it seems still preferable that using such a spurious 'behavioral' proxy...

ANOTHER ISSUE JUST POPPED TO MIND: We should instruct people to run this experiment when they are alone. In fact, they can be embarassed by seeing buttocks in public -- but that's THEIR problem. OUR problem, on the other hand, is that having an audience (or not) might interfere with the fruition. One might engage less in intimate picture if a crowd watch them, I suspect. Now, having no subjects doing their eperiment from mobile phone mitigates the risk; but maybe being explicit about loneliness during completion in the instructioin (and leveraging on the fact that they're about to see some NSFW material...) is in order. Isn't it? Or we could at least/also check whether subjects were alone during the experiment with a post-hoc question.

I guess that after these latter fine-tuning we're almost done, aren't we?

DominiqueMakowski commented 11 months ago

I implemented most changes, added a few screens here and there to fluidify the experience, finalized the consent form & debriefing, adjusted the order of things, grouped/conditionally displayed items (e.g., birth control) etc. I am quite happy with it :)

Things of note:

I did not put the demographics at the end. I tried, but it didn't flow well, in a way it makes sense to start with collecting info like any admin forms. That said, I added in the instructions for the birth control that it's optional. Same for the porn habits questions, so that participants don't feel weird (though given the nature of the experiment I'm not sure that a question about porn will throw them off 🤭). Let me know if that's alright
After a lot of trials and back and forth, I changed the "sexy" scale description with - what I think - a more accurate word: "enticing", that captures the notion of sexual appeal, objective attractiveness and desirability. I think it makes the distinction also clearer now, and so should be less weird for the participants. I amended the instructions to be as follows:

Please give it a go: https://realitybending.github.io/FictionEro/experiment/english1.html I'll wait for your green light with the aim of sending the ethics application by the end of the week (I know I've said that every week but now it's real - I think)

DominiqueMakowski commented 11 months ago

Note: after a brief survey with native speakers, we changed the "Picture" cue in favour of "Photograph" which seems to be more "reality-loaded"

DominiqueMakowski commented 11 months ago

Don't hate me but after further thinking, I think it would be incomplete, especially given the presence of non-ero stims, to not have a question about emotional valence. It offers a new dimension of pleasantness/unpleasantness that could capture reactions to non-ero stims as well as ero images that one could judge as disgusting etc.

We piloted it this new version on a couple of people and the duration seems to be ~24min (probably adding the 3rd scale adds ~2min, but I think it's worth it?), but the feedback was that it's rather fun and not too long 🤷 (it's the first time I personally ever run a study this short 😁) We asked about whether the scales were well-explained and "made sense" and it seems alright. We will still run a couple of pilots just to make sure everything is alright once I have your greenlight 🚥

MarcoViola88 commented 11 months ago

Hi guys, I've done a test today -- apology for postponing it so much, I was kinda overwhelmed by teaching duties until a week ago :-S My impression is overall very positive: many doubts I had reading instructions in 3rd person were solved by tackling the exp in 1st person. My hunch (or "phenomenal experience" if you feel philosophical :-P ) is that the manipulation DOES WORK. Although I knew the paradigm, I couldn't resist the impressions that some images are artificial -- and that triggered something (sometimes, uncanny feelings!). Moreover, while it complicates things a bit, the third axis (VALENCE) might capture some interesting dissociation, e.g. "it looks enticing but since you told me it's fake I'm feeling unpleasant". A few minor issues I invite @DominiqueMakowski to consider (but I won't enforce them):

questions about ethnicity & country should be put in a menu rather than a free input box
in the instruction page, why specifying that photos are "adjusted to be of similar dimension and aspect as the artificially-generated images"? It might nudge people into thinking that even photos are slightly 'artificial' -- something we don't want. Moreover, the instruction page is quite long, so why not erasing the sentence above altogether?
in the debriefing phase, a minor concern about "we would like to see if you found our image generation algorithm convincing and artifacts-free". I'm no English native, but "artifacts" sounds a little technical: for people outside our job, it might feel strange. What about synonims, e.g. "error"?
Debriefing boxes are quite useful. We might consider adding a input box too in order to collect suggestions on how to interpret the data? (Might come in handy while writing the discussion)

Whatever Dominique decides to do with the points above, I think we are ready to move to the next step. Dominique, have you already submitted the application to the IRB? In order to begin data collection we need to decide a protocol (e.g. snowball sampling? only free participants or paid participants too? and so on). Moreover, we need to translate the experiment in Italian & Spanish -- I'm ready & wlling to that in the next few days!

PS happy new years guys!

DominiqueMakowski commented 10 months ago

I had reading instructions in 3rd person were solved by tackling the exp in 1st person

Nice decentering skills, we should EEG you while you adopt the two mindsets ^^ Thanks Marco for the thorough testing

This is mostly due to a technical limitation (there is no easy way to add a drop-down menu). But tbh it works pretty well for me to clean the free-text input a posteriori for typos / etc.
I removed
I replaced by error-free
Added a free-text input

We did submit a first ethics application, we hope to hear back once people are back from the break :) Once it is cleared I'll add the document to the repo and will let you know. Here, as soon as we're good to go the students will start convenience sampling to recruit as much as they can (online, free). We'll see how it goes

In the meantime, we can start the translation. For this, one needs to create a copy of this file, named e.g., instructions_italian.js and translate the text. Then, we'll have scripts simply loading different sets of instructions in different languages (but the experiment code itself is the same so that to ensure we don't make some errors or some changes in one version that we don't port over to another version)

Antonio8424 commented 10 months ago

Hi guys! I hope the vacation lived up to expectations. In my case, that's how it was 😎. Together with Guido we translate the instructions (attached). Perhaps the most sensible adjustment would be the name of the scale "Enticing." Both Guido and I agree that the literal translation in Spanish has Catholic connotations ("fall into temptation"), which is why we consider that the label "Atractivo/apetecible" better fits what we want to measure. Let us know what you think. Un abrazo! Antonio Instrucciones Spanish.docx

DominiqueMakowski commented 10 months ago

the ethics has been approved 🥳 (it took more time than expected as the committee got reorganized just before Christmas).

So we will start collecting some data asap (PS: do send me your OSF accounts names if you have ones so I can add you to the OSF data repo). I made a few improvements to facilitate the tracking of the "source" of participants (and avoid needless duplication of HTML files): the experiment now collects and saves 2 "URL variables", exp (researcher/source) and lang (language).

So the new URL is now https://realitybending.github.io/FictionEro/experiment/english?exp=TEST&lang=en

note the question mark ? and then the url variables

so when we want to test the experiment, we can write TEST so we can then filter them out from the data. The link at the end of each experiment that invites participants to share it with others has exp=snow for "snowball". If I post the link on twitter/X I could add exp=domx If we put the experiment on some platform we will do exp=prolific. It's basically just to keep track of where the participants come from so we can trace the collection history.

Within the next few days/weeks (I need to finish to prepare a module first), I'll set up the spanish & italian versions so that we just need to then replace the text

MarcoViola88 commented 10 months ago

Hi guys, Thank you for the updates!

I'm finalizing the translation in ITA, but I'd like to ask Marco &/or Alessandro to check them before uploading it. Before the data collection, let me check the following: do we want to include someone else (as mere data collector) from other countries AND/OR provide other translations? (e.g. Dominique, do you want to include someone from your past affiliation in France/Poland/Asia?) I think this is NOT necessary, but I ask just because I suspect this is the last good moment to do so (... or not?)

BTW, here is my OSF profile: osf.io/e6js2

marmarini commented 10 months ago

Hi guys,

Fantastic news about the ethics approval! Thanks for letting me know about the tweaks you made. Here's my OSF profile link: [https://osf.io/b4th6/]. Marco, I'm ready for text-reviewing duties whenever you need a hand.

Huge thanks for all your hard work. Can't wait for this data collection to kick off! Marco

DominiqueMakowski commented 10 months ago

let me check the following: do we want to include someone else (as mere data collector) from other countries AND/OR provide other translations?

Good point, let me drop an email to Sperduti fto see if he has some bandwidth at the moment to run a French arm :)

AleAnsani commented 10 months ago

Hi guys, sorry I disappeared (again), I've just moved back to Jyväskylä. I tested the exp, it took me around 20 minutes. I find the whole procedure convincing, just three comments:

In the introductory screens of the tasks, we should probably specify that the images will remain on the screen for a couple of seconds. This is in the unlikely event that some participants could think that their laptop has some issues due to which it would go on automatically, without the user's consent.
There's a (concrete?) risk that participants will remember if a pic was presented as real or AI-generated, and respond accordingly in Task 2 (we should remember this in the analytic phase, perhaps by averaging the scores).
I am a bit afraid to leave the debriefing screen as it is now. In particular, I am afraid that some participants might talk with other future participants and spoil the trick. What if we tell the same information through a hyperlink? If the ethical committee consents. Something like: if you want to know more about this study's purposes, please click here.

What do you guys think? Anyway, these are minor things, and I wouldn't object to moving on to the next phase without making these tiny adjustments.

(Sorry about any possible mistake here, it's a bit late and I'm sleepy, but I wanted to give you my take on the exp)

P.S.: My OSF profile is: osf.io/47v9u

DominiqueMakowski commented 10 months ago

Thanks @AleAnsani

Fair point, I thought about where to insert it without altering the instructions too much (otherwise it would likely require an amendment) but didn't really find something easy (in particular, I wanted to change the after each image, you will have to... part, but doing something like after each image is presented for a couple of seconds, you will have to makes it more confusing). But while it's true that it might surprise some participants the first time, I think they'll understand that it's how it's meant to be after a couple of trials
Yes, that would even be good as it would show that they paid attention to the manipulation
Fair point, but that's an ethics requirement (people must be debriefed before submitting their data to give them a last chance to withdraw their participation). Changing this would require an amendment. As a mitigation, we now track the "snowballed" participants, so we can check a posteriori if there is a difference between that group and the "first-gen" participants

+ We started collecting and we have already a couple of participants so far ^^ so let's just roll for now and see how it goes

DominiqueMakowski commented 10 months ago

I'm meeting @marcosperduti this week to see
I initialized the italian version:
- https://realitybending.github.io/FictionEro/experiment/italian?exp=TEST&lang=it
- What I expect is that we will find some non-translated bits and pieces (especially default components like 'Continue' Buttons etc.) that we'll still need to move from demographics.js/fiction.js to instructions_*.js. You can start testing the italian version, and let me know where there is remaining english so I can enable its translation :)

MarcoViola88 commented 10 months ago

Hi,

please let me know how the meeting with MSperduti is going! Hope he won't mind jumping in even though we are at a quite advanced stage
I also expected that some fixes were in order for the Italian translation. I've fixed them when they pertained to instructions_italian.js. When I could not find them, I reported them in a separate issue (which we could hopefully close soon).
BTW, I hope you didn't mind if I mentioned to Italian participants that I was a further contact person for the study besides you, @DominiqueMakowski - this was in order to avoid having emails in Italians sent to you. It goes without saying that your major effort will be acknowledged when it comes to establishing first/last authorship!
We think that the Ethical Board approval we had for our last study can also cover the Italian data collection (only for self-reports, though; we'd still need to make another application for collecting physiological measures, but let me postpone the problem for now). Hence, we are virtually almost ready to begin data collection in Italy. Before doing so, however, I'd like to know which criteria we should aim for (if any). I know we don't need strict ex-ante inclusion criteria since we have a good control post-hoc, but still I think it's useful to have some rough guidelines (e.g. could we also try to get some Prolific participants? Are we aiming mainly for 20-30yrs students?)

DominiqueMakowski commented 10 months ago

Hence, we are virtually almost ready to begin data collection in Italy.

Nice!!

(e.g. could we also try to get some Prolific participants? Are we aiming mainly for 20-30yrs students?)

Well afaic the only limitation for prolific is money 😁

Just for budget calculation, they say " We recommend you pay participants at least £9.00 / $12.00 per hour, while the minimum pay allowed is £6.00 / $8.00 per hour."

So for a half-an-hour experiment it would be £3-4.50, + prolific fees. I'd say £4/4.70€ (or £4.30/5€) per participant in total (including prolific's cut) is fair. So we have to budget around 50€/10 participants.

I'd like to know which criteria we should aim for (if any)

Which makes me think that we kinda-sorta forgot to preregister 😱 Either we do it now (but we say data collection is on its way, but not processed); or we preregister the non-english versions "separately"? I don't think it's a big of a deal, since our hypothese are quite obvious and clearly stated but well better to do it than not.

As for the inclusion criteria, what do you have in mind?

translation

Thanks for the detailed issue on the translation, I'll try to fix the remaining points

Antonio8424 commented 10 months ago

Hi guys, these are definetely good news. Since the "original" consent is already approved, I am going to consult with the ethics committee of my university if this approval is sufficient to apply it to the students I have. I share @MarcoViola88 observation, is it feasible to include my contact name in the Spanish version? I think this would also facilitate dissemination, since it would be easier to see that the university is part of the research. I agree 100% with @MarcoViola88 that @DominiqueMakowski tremendous effort should be reflected in the authorship of the article. I will be waiting for any questions regarding the Spanish version. Un abrazo, Antonio

DominiqueMakowski commented 10 months ago

include my contact name in the Spanish version

of course, feel free to remove me even if you prefer i don't mind (it's just that I need to be ther for the english version as per ethics approval but otherwise I really don't care)

marcosperduti commented 10 months ago

Nice to meet you all, and thanks for the collaboration proposal. I've read the whole brainstorming, and I think that you've already discussed the major issues there and find the optimal solutions. Next steps for me, if I've well understood, is to create a french version. I will probably have to apply for ethical approval in my university. But I hope that they will be indulgent, since there already is an ethical approval from another university.

MarcoViola88 commented 10 months ago

Hello Marco, happy to have you onboard :-) And BTW, since this brings the # of Marcos involved in this study to 3 (including myself and Marco Marini), so I propose to include last name in futher communications XD

I see from OSF that data collection is going smoothly in UK. Good to know! Let me know if we can proceed to setup the data collection strategy in Italy too (@marmarini or @AleAnsani will do another check to the Italian version of the exp to see if there are still open issues to be fixed). Have we set an ideal # of participants to collect for each country, e.g. via Power Analysis?

AleAnsani commented 10 months ago

It'd be nice to have a chat about power analysis with @DominiqueMakowski. I don't know if we're going to go for GLMMs or go Bayesian. In the first case, power calculations are a bit blurred (in that, as far as I know, there's no consensus on the ultimate way to compute power (some indications here, here, and here); simr might be a good R tool for simulation, but again, there's no consensus). To be safe, we could just stick to what we did in Marini et al. (2024), but again, @DominiqueMakowski is the absolute master here =)

Apart from that, I started drafting an R code to merge the CSVs and do some data cleaning, but I stopped when I realized that the response variable needed some JSON manipulation (i.e., all the DVs are within the same cell, whereas I would place their values in 3 different columns). I hope I'll continue in the next few days, although idk if I'll have time enough very soon. But wouldn't it be great if we had an R script in the data folder to analyze data in real-time? (...yes, I'm teasing you all!)

P.S.: Welcome on board @marcosperduti !! We're so glad to have you here!! Best of luck with the ethical approval ;)

marcosperduti commented 10 months ago

Best of luck with the ethical approval ;)

Thank you. BTW did you ask for ethical approval in your University? Or are you running the study under the Dominique's ethical approval?

AleAnsani commented 10 months ago

@marcosperduti as for me, I haven't requested any approval from Jyväskylä. I don't think I'll recruit participants through their official channels, so I don't think I need it. But you raised a good point, maybe I'd better ask for this...

DominiqueMakowski commented 10 months ago

Italian version

Need to translate these variables. Once it's done and we're ready to deploy, I will uncomment the saving of the data for the italian version and we can proceed

French/Spanish versions

You can copy and paste that instructions_english.js file, open it with a text editor (or VS code in my case, or even Rstudio), translate all the variables (beware of the html syntax that indicates the formatting). and then you paste it here in this discussion or sent it via email and I'll add it to GitHub and will create the links for experiments

Power analysis

I don't like power analyses, I think they rely on absurd assumptions, tend to give non-realistic estimates, and simply don't scale well with the type of analysis that we want to do (i.e., go beyond t-tests and correlations). I tend to be from the school of "collect as much as you possible and then do the right stats to reliably estimate the uncertainty and incorporate that in the interpretation and discussion". (See this and this that just came out; as well as this). BUuuut that's only my personal opinion so by all means feel free to disagree and do what you think is right 😁 Possible options I see are:

Collect enough data to minimize false positives (which we will easily reach for most analyses) and use a high evidence threshold
Run a power analysis on your past study as it's the closest to what we do and get a number from it
Do a power sensitivity analysis a-posteriori (much more interesting and relevant that a priori power analysis IMO)
Run a multiverse analysis where we our subteams analyze the data separately using different approaches and then we see if we reach the same conclusions

I don't know if we're going to go for GLMMs or go Bayesian

Bayesian GLMMs 🙃 (though TBF I expect Frequentist GLMMs will give the same evidence)

R script

I see you're all hungry for some data eh ^^ I posted in analysis/:

The Python preprocessing script: this downloads, extracts and formats the data from OSF (note that because the OSF repo is private, you'll need to create an OSF token (in the OSF profile settings) that will give you a code, that you can then input to the token="" variable to allow it to access the storage. But this script doesn't do much but processing anyway so I wouldn't bother on trying to run it, I'll be running it every now and then to update the data and you can focus on the processing per se. It saves the data on GH as rawdata_* (one file for the task one for the participants' info)
👉 The cleaning R script: it does some descriptive stuff, aims at removing "outliers" and stuff (mobile users, see below) and saves the clean, final data.

Let's if possible let's not post the analysis scripts publically yet (because as it's the job of the supervision students I don't want them to have everything already done :) - we can discuss results and show graphs etc. but just don't reveal the code ^^

Note:

Current data is mostly from some dodgy reddit communities (about porn) so it's only (not-your-average) guys, and many of them obviously did not read even the post (where I say "don't do it on mobile as it's not mobile friendly; hence we have a lot of mobile users that I filter out in the cleaning). Maybe for the next versions we can simply block the experiment from the start for mobile users but idk... ~60% of people reported it to be "fun" so 🤷‍♂️
So basically it's low quality data, so take the current results with a bucket of salt...

Results spoiler alert

It seems like our manipulation works, but not primarily on the scale that we would have hoped... and not as strongly as I expected

We can open a new issue called "analysis" to discuss stuff related to data preprocessing / analysis

DominiqueMakowski commented 10 months ago

reddit1 reddit3

😅

MarcoViola88 commented 10 months ago

Italian version

Need to translate these variables.

Sorry, for some reason (e.g. being a total noob) I cannot edit (cf. below) Let me place the translation here, after a slash, i.e. "/":

var button_continue = "Continue" / "Continua" var button_end = "End" / "Fine"

var demographics1_preamble = "Please answer the following questions:" / "Per favore, rispondi alle seguenti domande:" var demographics_q_sex = "What is your biological sex?" / "Qual è il tuo sesso biologico?" var demographics_c_sex = ["Male", "Female", "Other"] / ["Maschio", "Femmina", "Altro"] var demographics_q_edu = "What is your highest completed education level?" / "Qual è il titolo di studio più alto che hai conseguito?" var demographics_c_edu = [ "University (doctorate)", / "Università (dottorato)", "University (master) _{^{or equivalent}}", / "Università (laurea magistrale) _{^{o equivalente}}", "University (bachelor) _{^{or equivalent}}", / "University (laurea triennale) _{^{o equivalente}}", "High school _{^{or equivalent}}", / "Scuola superiore _{^{o equivalente}}", "Primary school", / "Scuola dell'obbligo", "Other", / "Altro", ] var demographics_q_age = "Please enter your age (in years)" / "Per favore immetti la tua età (in anni)" var demographics_p_age = "e.g., '31'" / "per es. 31" var demographics_q_eth = "Please enter your ethnicity" / "Per favore immetti la tua entia" var demographics_p_eth = "e.g., Caucasian" / "per es. Caucasica" var demographics_q_cou = "In which country do you currently live?" / "In che paese risiedi attualmente?" var demographics_p_cou = "e.g., UK, Spain" / "per es. Italia, UK" var demographics_q_lang = "How would you rate your level of English?" / "Come giudicheresti il tuo livello di Inglese?" var demographics_c_lang = ["Beginner - 0", "1", "2", "3", "4", "5", "6 - Native"] / ["Principiante - 0", "1", "2", "3", "4", "5", "6 - Madrelingua"] var demographics_q_ai = "How knowledgeable do you consider yourself about Artificial Intelligence (AI) technology?" / "Quanta familiarità ritieni di avere con l'Intelligenza Artificiale? (IA)" var demographics_c_ai = ["Not at all - 0", "1", "2", "3", "4", "5", "6 - Expert"] / ["Nessuna - 0", "1", "2", "3", "4", "5", "6 - Molta"] var demographics_hormones_preamble = "The following question is important to understand the role of potential biological factors in our study.
It is however optional, and you can skip it if you want." / "La prossima domanda ci interessa al fine di comprendere il ruolo potenziale di fattori biologici nel nostro studio.
Si tratta però di una domanda opzionale, se vuoi puoi saltarla." var demographics_q_hormones = "If you are a female, are you currently using birth control treatment?" / "Se sei femmina, stai attualmente utilizzando dei trattamenti contraccettivi?" var demographics_c_hormones = [ "No", / "No", "Yes - contraceptive pills (combined pills)", / "Sì - pillole contraccettivi (pillole combinate)", "Yes - contraceptive pills (progestogen-only pills)", / "Sì - pillole contraccettivi (pillole a base di solo progestinico)", "Yes - intrauterine device (copper coil, IUD)", / "Sì - dispositivo intrauterino (spirale, coppetta intrauterina)", "Yes - intrauterine system (IUS)", / "Sì - dispositivo intrauterino a base di ormoni (IUS)" "Yes - female condoms", / "Sì - condom femminili", "Yes - condoms for partner", / "Sì - condom per il partner "Yes - other", / "Sì - altri",

... Or please kindly suggest me how to make the changes myself !

DominiqueMakowski commented 10 months ago

weird why it doesn't let you (the mysteries of github), anyway I updated the file, let's make sure we didn't forget any line or button