JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.51k stars 712 forks source link

How do i cite you? #229

Closed Lassehhansen closed 2 years ago

Lassehhansen commented 3 years ago

Currently writing a paper with scraped twitter data, that is scraped with your package. How may I Cite you?

JustAnotherArchivist commented 3 years ago

Good question, thank you for bringing that up! I've never dealt with software citations before, but based on some cursory search, I would suggest the following:

Author: JustAnotherArchivist Title: snscrape: A social networking service scraper in Python URL: https://github.com/JustAnotherArchivist/snscrape Version: the version you used, e.g. v0.3.4, or the commit ID if you used a code that doesn't have a version number, e.g. commit 97c8caea Date: matching the version, e.g. 2018–2020 for the latest release or 2018–2021 for the current development version; if your citation style mandates a single year rather than a range, use the date of the release/commit.

As I said, I have no experience with this, so if you have any suggestions whatsoever, please do let me know!

Lassehhansen commented 3 years ago

Well there is no 'right' way to do it ofc. But I would probably include your actual name for the authorship? And et al. would only be if the other authors are mentioned by their name and the number of those exceed three.

JustAnotherArchivist commented 3 years ago

I currently do not wish to link my real identity to this project.

Maintaining a full list and keeping it accurate for the different versions etc. would certainly be annoying. But in any case, as far as I can see, software citations typically only reference the main author(s), not all contributors, and that makes sense to me. Will edit the comment above accordingly.

Lassehhansen commented 3 years ago

Ahh okay, i totally get that. I will put your information given above then :)

gavox commented 3 years ago

Hello! I am using the 0.3.5 version. I can not find a way to determine the date, but I assume that this would be dated at 2021. Is that correct? Thanks!

JustAnotherArchivist commented 3 years ago

There is no version 0.3.5. The last versioned release was 0.3.4 in July 2020. If you're running the development version, I recommend using the commit ID. The string reported by snscrape --version also works; it is auto-generated when you're installing from the repository and includes the abbreviated commit ID as the last seven characters (after +g). However, don't rely on that output if you installed snscrape in edit mode (python3 setup.py develop or pip install -e) and manipulated the repository in any way after the installation!

To check the date of a particular commit without using Git, you can use GitHub's web interface: https://github.com/JustAnotherArchivist/snscrape/commit/660b8c7.

I'll look into whether it's possible to add a snscrape --citation flag that prints all the necessary information for convenience.

JustAnotherArchivist commented 3 years ago

An update on that: it doesn't seem to be easily possible. The tricky part is the date/year info. I do not want to have that in the code, just like the version isn't anywhere in the code. There is no package metadata field for a date or year of release, nor is there a way to specify custom metadata fields. This would have to be implemented as a distutils extension while also hooking into setuptools_scm to get the info from Git at sdist/install time. The other option is including it in the version number, but the local part (...+foobar) is not allowed by PyPI, so it would have to be in the actual version number. Just about any way of doing that seems really nasty except YYYY.x-style version numbers for releases. I'm not totally opposed to that (although I prefer SemVer), but that wouldn't cover dev version installations. I'll have to ponder this a bit more...

JustAnotherArchivist commented 2 years ago

snscrape --citation is now implemented, although it will not work correctly until the release (coming very soon).

I decided to stick to SemVer but add a date at the end, e.g. v1.2.3.20210230. The --citation code then extracts the year from that and displays it in the date line.

This is obviously not ideal, but I could not find any solution that would work reliably with dev version installations without unreasonable effort. Therefore, if you run --citation from a dev version (as detected from .dev in the version number), there will be a warning that the date may be incorrect and to fix it manually. A further implication of this is that the version number on dev installations will be slightly weird and may report invalid dates since setuptools_scm increments the last part of the version number. However, I don't consider this a big concern. I will try to keep the released version more up to date in the future so that people generally won't have to use the dev version in the first place.

marymlucas commented 2 years ago

Hi, I'm trying to cite snscrape in a paper I'm working on and found this post. But running snscrape --citation doesn't work for me on the command line. I get the error message:

snscrape: error: the following arguments are required: SCRAPER

I'm on version 0.3.5.dev138+ga6b6f3f. Any suggestions?

JustAnotherArchivist commented 2 years ago

--citation is only available from version 0.4.0. Please refer to my comment above instead.

marymlucas commented 2 years ago

Ah thank you. That was the issue. I just updated to 0.4.3.20220106 and all good now.