IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

Feature Request/Idea: switching from Sphinx docs to using Docusaurus #8905

Open kuhlaid opened 2 years ago

kuhlaid commented 2 years ago

Overview of the Feature Request What are thoughts on switching from Sphinx docs to using Docusaurus for the Dataverse documentation?

The search feature in Sphinx is no good and does not handle quoted 'specific' text searches very well, if at all. For instance, if I search on "categorical labels" it gives me top results on items that are not specific to these two words. Docusaurus has built-in support for Algolia document search, which is free to Open Source projects and works extremely well on specific text searches. This makes searching things in the documents much easier.

I would be willing to work on transiting the docs to Docusaurus if there is interest in it. Anyway, just something to think about that might improve the Dataverse documentation.

pdurbin commented 2 years ago

@kuhlaid hi, there's a related issue here:

Back then I was suggesting downloading the PDF version of the guides but https://guides.dataverse.org/en/latest/Dataverse.pdf is a 404 these days so I guess we don't build a PDF anymore. It looks like we had PDFs as of 4.17 (see #6168): https://guides.dataverse.org/en/4.17/Dataverse.pdf

@kuhlaid I'm wondering, would you be able to do the searches you like in a PDF? Could that be a workaround?

I'm a pretty big fan of Sphinx so I'm personally not very excited about switching unless the benefits really outweigh the costs.

(Docusaurus does look pretty cool from a quick look.) 😄

poikilotherm commented 2 years ago

Please note that you can also add Algolia to Sphinx docs. A quick search revealed people succeeded with that already.

On a related note: just by moving to Docosaurus the underlying substantial issues of our documentation won't go away.

kuhlaid commented 2 years ago

@pdurbin I do not suggest PDF as a main source of searching documentation just from a usability standpoint. Most users will see the Sphinx search box in the documentation and go straight for that.

Does Sphinx search give you search stats to help improve documentation structuring and keywords? I'm not knocking all of Sphinx just the search, which is no es bueno. So, if Algolia or similar could replace the Sphinx search, that might be a step in the right direction.

@poikilotherm I would be curious to hear more regarding the underlying substantial issues of our documentation that need to be addressed. I will say that the API documentation is rather sparse, but I understand that there are only so many hours in a day to address these things. I think this is a valuable project and only want to help see it succeed.

kuhlaid commented 2 years ago

Just for giggles I started building out a Docusaurus version of the Dataverse docs (dataverse-docusaurus.vercel.app) as a proof of concept. Since the pages are static they load instantly and updates to the repository get automatically pushed to Vercel for testing. I will throw in the search feature once I have converted more of the docs.

pdurbin commented 2 years ago

@kuhlaid neat! Is the source of the docs on GitHub? Ah, I think I found it: https://github.com/kuhlaid/dataverse-docusaurus

Markdown instead of .rst. Interesting. 😄

pdurbin commented 2 years ago

https://podrocket.logrocket.com/docusaurus was a good listen. I dunno, it's a lot of work to switch! 😄

kuhlaid commented 2 years ago

I gave up on Algolia. With Algolia you can't create a testing/development environment with them, ONLY PRODUCTION. That was ridiculous. I also tested TypeSense but it is too immature at this point so I decided to switch to local search since TypeSense has limitations on searching items in quotes and codeblocks and TypeSense is a pain to setup with scraper (which is a huge pain point that likely prompted Algolia to ditch it from their processes). Anyway, the test docs I compiled are running at https://dataverse-docusaurus.vercel.app with local search installed. Local search does a better job of searching within code blocks and quotes.

pdurbin commented 1 year ago

@kuhlaid very cool. The search seems nice.

I just mentioned this issue and your efforts to @siacus. Like me, he's a pretty big fan of Markdown. A lot of people haven't even heard of reStructuredText, making it harder to contribute beyond simple text changes. I even opened an issue about writing up some tips on using rst:

I'm still on the fence though. It's a lot of work to switch. Plus, I'm not sure what features we'd lose such as the ability to create PDFs and ePubs.

kuhlaid commented 1 year ago

@pdurbin I thought I had posted the following suggestion of keeping Sphinx (but I guess I never pressed send). Anyway, my thought was, keep Sphinx but add a link to the PDF version of the Dataverse documentation beside the Sphinx search field with a note stating something along the lines of for more robust searching, search within the PDF version of our documentation. That might improve the existing search situation. I don't even know where to find a PDF version of the Dataverse docs within the docs site in the current state.

pdurbin commented 1 year ago

@kuhlaid https://guides.dataverse.org/en/latest/Dataverse.pdf is a 404 😢

https://guides.dataverse.org/en/4.17/Dataverse.pdf works. There's a PDF to download if you want to look at it. So it's been a while (the 4.x days, I guess) since we built the PDF. This issue from 4.17 is related:

Anyway, I agree with you. It would be nice to have a PDF of the guides again. And we could link to them from the HTML guides, once they're working, and suggest it as an alternative for searching. Do you want to create an issue for this?

Also, how to do feel about ePub? The ePub build is still working, it seems: https://guides.dataverse.org/en/latest/Dataverse.epub

kuhlaid commented 1 year ago

From the https://github.com/IQSS/dataverse/issues/6168 issue, it appears as though the PDF generator is unhappy about the non-ASCII characters within the documentation source files. To search for files containing any non-ASCII characters, you can run the following within a bash terminal

LC_ALL=C find . -type f -exec grep -c -P -n "[^\x00-\x7F]" {} +        # within the current directory, list the files and number of non-ASCII characters in them
LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" somefile.txt   # shows where non-ASCII characters are found in a file (copy the results to an empty text file or somewhere to reference)
nano somefile.txt   # use CTRL+W then CTRL+T to go to a specific line in the file

I was not able to find any non-ASCII characters in the current files.

pdurbin commented 11 months ago

We're adding Markdown support. Please see this issue:

pdurbin commented 2 months ago

The Contributor Guide is written in Markdown.