bootlin / elixir

The Elixir Cross Referencer
GNU Affero General Public License v3.0
957 stars 141 forks source link

Inline reference preview/lists are broken #269

Closed mwalle closed 7 months ago

mwalle commented 7 months ago

At least for u-boot the the references list are broken. I.e. choose "u-boot" -> lib/rand.c -> click on srand, then i'll get "No definitions found in the database", although if you search it manually you get some results (e.g. https://elixir.bootlin.com/u-boot/v2024.01-rc5/A/ident/srand)

Tested with Elixir 2.2 on elixir.bootlin.com

tpetazzoni commented 7 months ago

This seems to be a regression that came with the new HTML popup. @fstachura, do you think you could have a look ?

fstachura commented 7 months ago

@mwalle Sorry to hear that. As a temporary fix - middle or ctrl click will open the identifier in a new tab, bypassing the popup. The results seem to be correct there. Alternatively, disabling javascript for elixir.bootlin.com with a browser extension should revert back to the old behavior.

@tpetazzoni Sure, it seems that the API incorrectly returns empty results for this identifier https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C

The same URL without the family parameter returns correct results (NOTE: try it with curl, for some reason that link still returns empty results when opened in a browser) https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest

mwalle commented 7 months ago

@mwalle Sorry to hear that. As a temporary fix - middle or ctrl click will open the identifier in a new tab, bypassing the popup. The results seem to be correct there. Alternatively, disabling javascript for elixir.bootlin.com with a browser extension should revert back to the old behavior.

Thanks for the productivity boost ;)

tpetazzoni commented 7 months ago

Thanks a lot @fstachura for the initial investigation! Indeed, looks like a server-side issue then.

fstachura commented 7 months ago

@tpetazzoni I downloaded and indexed u-boot sources on my local machine, and the results are correct there, even with the family parameter. I have a suspicion that the cache is somehow involved.

fstachura commented 7 months ago

I noticed another strange thing. I added a URL parameter to force Varnish to return uncached content. On some requests that bypass cache, the backend returns correct data.

% for n in `seq 4000 4100`; do echo https://elixir.bootlin.com/api/ident/u-boot/srand\?version\=latest\&family\=C\&test\=$n; curl https://elixir.bootlin.com/api/ident/u-boot/srand\?version\=latest\&family\=C\&test\=$n; echo; done

https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4000
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4001
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4002
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4003
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4004
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4005
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4006
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4007
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4008
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4009
{"definitions": [{"path": "include/rand.h", "line": 20, "type": "prototype"}, {"path": "drivers/crypto/ace_sha.c", "line": 122, "type": "function"}, {"path": "drivers/rng/npcm_rng.c", "line": 72, "type": "function"}, {"path": "lib/rand.c", "line": 29, "type": "function"}], "references": [{"path": "cmd/mem.c", "line": "1297", "type": null}, {"path": "drivers/crypto/ace_sha.c", "line": "152,181", "type": null}, {"path": "drivers/ram/stm32mp1/stm32mp1_tests.c", "line": "616,619", "type": null}, {"path": "drivers/rng/sandbox_rng.c", "line": "24", "type": null}, {"path": "lib/uuid.c", "line": "477,479", "type": null}, {"path": "net/dhcpv6.c", "line": "618", "type": null}, {"path": "net/net_rand.h", "line": "55,57", "type": null}, {"path": "scripts/kconfig/conf.c", "line": "537", "type": null}, {"path": "test/dm/mux-cmd.c", "line": "129,163", "type": null}, {"path": "tools/gen_eth_addr.c", "line": "17", "type": null}], "documentations": [{"path": "include/rand.h", "line": "12", "type": null}]}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4010
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4011
{"definitions": [{"path": "include/rand.h", "line": 20, "type": "prototype"}, {"path": "drivers/crypto/ace_sha.c", "line": 122, "type": "function"}, {"path": "drivers/rng/npcm_rng.c", "line": 72, "type": "function"}, {"path": "lib/rand.c", "line": 29, "type": "function"}], "references": [{"path": "cmd/mem.c", "line": "1297", "type": null}, {"path": "drivers/crypto/ace_sha.c", "line": "152,181", "type": null}, {"path": "drivers/ram/stm32mp1/stm32mp1_tests.c", "line": "616,619", "type": null}, {"path": "drivers/rng/sandbox_rng.c", "line": "24", "type": null}, {"path": "lib/uuid.c", "line": "477,479", "type": null}, {"path": "net/dhcpv6.c", "line": "618", "type": null}, {"path": "net/net_rand.h", "line": "55,57", "type": null}, {"path": "scripts/kconfig/conf.c", "line": "537", "type": null}, {"path": "test/dm/mux-cmd.c", "line": "129,163", "type": null}, {"path": "tools/gen_eth_addr.c", "line": "17", "type": null}], "documentations": [{"path": "include/rand.h", "line": "12", "type": null}]}
fstachura commented 7 months ago

Ok, I think I know what the issue is, or at least why separate requests sometimes return different results for the same URL. I have managed to reproduce something similar on my local machine. Cache is not at fault (obviously).

api.py is a WSGI script. mod_wsgi starts multiple processes of the same WSGI script https://github.com/bootlin/elixir/blob/master/api/api.py#L40 build_query imports query (local import, executed on each call) https://github.com/bootlin/elixir/blob/master/query.py#L30 query.db is a module-global variable that is initialized on first import. From what I understand, it is an interface to project data https://github.com/bootlin/elixir/blob/master/lib.py#L187 lib.getDataDir() simply returns $LXR_DATA_DIR which is set to the data directory of a project on each build_query call. This value is used to initialize query.db.

I think that the assumption made in build_query was that on each call, the local import statement will reimport the module, recreating all the global variables from that module. That does not seem to be true, imports are globally cached. https://docs.python.org/3/reference/import.html#the-module-cache

So, there is separate query.db for each process, and it is initialized on the first bulid_query call. On a fresh apache start, if the first request is for something from Linux codebase, and the second request is for something from Musl codebase, its likely that each request will go to a different process. In each process, query.db will be set to a different database. Then, following requests might unexpectedly return empty data (which will be then cached etc etc) because some requests will go to the process that has the Linux db, and some will go the the process that has the musl db. This was never a problem in the original cgi script, because it's re-executed from scratch on each HTTP request.

I think that there still is something missing in my reasoning. The bug would imply that the number of projects working correctly is limited by the number of WSGI processes started by apache. I guess that most people use Elixir for browsing Linux sources, which would mean that on most processes query.db points to the Linux project database. Meanwhile all projects from elixir.bootlin.com that I tried to browse seemed to be working fine. But I wasn't looking very closely. Maybe the cache or the algorithm that picks the process that will handle a request somehow helped to maintain the illusion that everything is OK.

I see two possible fixes:

I'm gonna go ahead and open a PR with the first fix, since this probably impacts a sizeable chunk of the userbase and therefore is more or less urgent.

tpetazzoni commented 7 months ago

Thanks a lot @fstachura for the investigation and quick fix!

I am personally not familiar with the Elixir code base, so I find the need to force re-import a module to not be very great. And it seems like you agree: this is the quick and dirty fix, and we need a more long term fix that reworks the code so that query.py is better structured to handle multiple projects.

I'll let @michaelopdenacker handle your pull request with the quick fix.

fstachura commented 7 months ago

The quick fix is not enough. Mod_wsgi utilizes multithreading and multiprocessing, meanwhile this solution does not seem thread safe (and I'm actually not sure if DB is thread safe at all)

fstachura commented 7 months ago

@mwalle A fix for this was just deployed on elixir.bootlin.com. Could you try to reproduce the bug now?

mwalle commented 7 months ago

@fstachura thanks, seems to work now!

michaelopdenacker commented 7 months ago

Thanks for confirming. Closing this bug.