Lookyloo / lookyloo

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
https://www.lookyloo.eu
Other
679 stars 83 forks source link

Naming of things (consistency and clarity issues) #118

Closed Rafiot closed 2 years ago

Rafiot commented 3 years ago

The following terms need to be replaced everywhere:

matt-ross16 commented 3 years ago

I see that you've changed the naming of scrape and flag, but would you like me to go through the repo and make the final changes of legitimate -> known?

Rafiot commented 3 years ago

Sure! This one is a little bit more tricky as I use legitimate all over the place in the contextualization module (context.py), but if you're up for that, please go for it, and it is not urgent, so take your time.

And let me know if you have any question!

matt-ross16 commented 3 years ago

So are there any instances of legitimate that are needed to stay the same? Or will it be safe to just scour everything for legitimateand change it no matter what?

Rafiot commented 3 years ago

No, everything should be renamed to known, I think (that's why I didn't do it just yet). There might be places where legitimate makes sense, or where using "known" doesn't make sense and we will need to use an other term (depending on the context).

But if you start by renaming it everywhere, and making sure it works, I can review the PR and figure out where it should be done differently.

matt-ross16 commented 3 years ago

Hi @Rafiot,

I'm running into some issues running the program through poetry. Any idea what is causing this?

image

It seems to be stuck, and when I navigate to http://0.0.0.0:5100/, I'm getting ERR_ADDRESS_INVALID

Rafiot commented 3 years ago

Hmmm I'm not totally sure. The warnings don't matter (you need API keys for the modules, and they're optional).

The website shutting itself down immediately is weird and shouldn't happen. Can you try to do the following and paste the output in the terminal:

poetry shell
stop  # to make sure nothing is running
start  # to start the redis DBs
start_website  # To start the website only (it is also done by start, but you might have more logging)
matt-ross16 commented 3 years ago

Wasn't able to get any further than stop

Looks like it has gotten into a loop, and I have to force close it to jump out. Is this expected?

image

The errors after U force closed:

image

Rafiot commented 3 years ago

So this error is because the async script failed to start and didn't cleanup redis, but that doesn't explain why it's not starting.

Did you get any errors when you ran poetry install?

Rafiot commented 3 years ago

@matt-ross16 any news on the issues? I'm willing to figure out what's going on, but this is very confusing right now, I installed a new instance today following the guide, and it worked just fine, so I guess I do something that isn't documented.

matt-ross16 commented 3 years ago

Hey, sorry about keeping you waiting! Eventually found out that the Python version being pointed to was 3.6 rather than 3.7, so now trying to work through everything again. Silly me!

Rafiot commented 3 years ago

oh right, that makes sense. In theory, poetry install should have failed. If it didn't, I need to add a check for that.

matt-ross16 commented 3 years ago

I did initially have problems, but it seemed like I had gotten by them eventually... don't know for sure Working through it properly now

Rafiot commented 3 years ago

all good, if anything pops up, do not hesitate to open an issue, or do a PR, either here or in the docs repo.

matt-ross16 commented 3 years ago

Ok, found the issue I had with poetry the first time: when installing with the curl statement, poetry doesn't get added to the PATH, so it isn't immediately recognized, hence why I got hung up on it for a while.

Needed to run source $HOME/.poetry/env for poetry commands to finally work.

matt-ross16 commented 3 years ago

Not sure why I'm getting hung up on this:

image

This was the error when hard quitting:

image

Rafiot commented 3 years ago

And if you give it some time? Normally, on first run, the script will check if lookyloo was started with systemd, if not, it falls back to poetry run stop and poetry run start. And in 10-15 seconds, poetry run startshould start the website, and you get the terminal back.

Screenshot_20201111_014800

Rafiot commented 3 years ago

Alternatively, you can just execute start from within the poetry shell

matt-ross16 commented 3 years ago

Ok I'll try being patient this next time. I thought I had left it for about a minute, but I'll try again.

I'll also try just start within the shell

Rafiot commented 3 years ago

Yeah, it's weird. At this point, you do not need to run update anymore, just start should do...

matt-ross16 commented 3 years ago

Unfortunately, poetry run start just hangs for an endless amount of time

image

Rafiot commented 3 years ago

But it also shuts itself down immediately, which is definitely the issue there.

And the script is supposed to stop either is the shutdown key is set in redis, or if the gunicorn process fails to start: https://github.com/Lookyloo/lookyloo/blob/0e14804a76e750d016429737296fce93da14af94/bin/start_website.py#L26

My guess is that the shutdown key is somehow still there... I see the two following options:

Then, the redis dbs should be gone, and you're 100% sure the shutdown key is not there anymore.

And to check if gunicorn is installed, from the poetry shell, run gunicorn --version

matt-ross16 commented 3 years ago

I removed the rdb files like you mentioned, ensured that gunicorn was installed, and was sure everything was stopped before I ran poetry run start again.

image

AsyncScraper seems to be having an issue. It gets hung up there forever.

Rafiot commented 3 years ago

So that's something I renamed when I went from scrape to capture: https://github.com/Lookyloo/lookyloo/commit/ea052c7c12499dbae0fcdfbd31ffb5c7c53f15cf#diff-4215265574b377c9578893c70a853aeef7bd9a2c270760e81d9754b512033d25

So if you have this exception, it's because the code is not up to date.

Can you run git pull?

But at this state, the website seems to be running, and you should be able to open http://0.0.0.0:5100 in your browser

stale[bot] commented 3 years ago

Close call! This issue has been marked as stale because it has not had any recent activity. It should be closed if no further activity occurs. Add a comment or push a commit to keep this issue stay alive and kicking. Thank you for your contribution; it is appreciated.

PolarBearGod commented 3 years ago

Not sure if this is the right place for it or not but I am also experiencing this issue described here. Unlike the other individual, I am still unable to get poerty run start to get the project running. Below is the output I am seeing:

username@lookylou:~/lookyloo$ poetry run start
Start backend (redis)...
done.
Start asynchronous ingestor...
done.
Start background indexer...
done.
Start website...
done.
2021-04-29 16:40:06 +0000] [2729] [INFO] Starting gunicorn 20.1.0
[2021-04-29 16:40:06 +0000] [2729] [ERROR] Connection in use: ('0.0.0.0', 5100)
[2021-04-29 16:40:06 +0000] [2729] [ERROR] Retrying in 1 second.
[2021-04-29 16:40:07 +0000] [2729] [ERROR] Connection in use: ('0.0.0.0', 5100)
[2021-04-29 16:40:07 +0000] [2729] [ERROR] Retrying in 1 second.
04:40:08 BackgroundIndexer INFO:Initializing BackgroundIndexer
04:40:08 AsyncCapture INFO:Initializing AsyncCapture
04:40:08 Lookyloo WARNING:Unable to setup the PhishingInitiative module
04:40:08 Lookyloo WARNING:Unable to setup the PhishingInitiative module
04:40:08 MISP INFO:Module not enabled.
04:40:08 Lookyloo WARNING:Unable to setup the MISP module
04:40:08 UniversalWhois INFO:Module not enabled.
04:40:08 MISP INFO:Module not enabled.
04:40:08 Lookyloo WARNING:Unable to setup the MISP module
04:40:08 UniversalWhois INFO:Module not enabled.
04:40:08 Lookyloo WARNING:Unable to setup the UniversalWhois module
04:40:08 Lookyloo WARNING:Unable to setup the UniversalWhois module
04:40:08 BackgroundIndexer INFO:Launching BackgroundIndexer
04:40:08 AsyncCapture INFO:Launching AsyncCapture
[2021-04-29 16:40:08 +0000] [2729] [ERROR] Connection in use: ('0.0.0.0', 5100)
[2021-04-29 16:40:08 +0000] [2729] [ERROR] Retrying in 1 second.
[2021-04-29 16:40:09 +0000] [2729] [ERROR] Connection in use: ('0.0.0.0', 5100)
[2021-04-29 16:40:09 +0000] [2729] [ERROR] Retrying in 1 second.
[2021-04-29 16:40:10 +0000] [2729] [ERROR] Connection in use: ('0.0.0.0', 5100)
[2021-04-29 16:40:10 +0000] [2729] [ERROR] Retrying in 1 second.
[2021-04-29 16:40:11 +0000] [2729] [ERROR] Can't connect to ('0.0.0.0', 5100)
gunicorn stopped itself.
Shutting down website.

Docker container is up and running via another terminal session as the documentation says.

username@lookylou:~$ sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-browser-caches
[sudo] password for username:
2021-04-29 16:31:58+0000 [-] Log opened.
2021-04-29 16:31:58.246151 [-] Xvfb is started: ['Xvfb', ':1006927673', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-splash'
2021-04-29 16:31:58.536702 [-] Splash version: 3.5
2021-04-29 16:31:58.768683 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2
2021-04-29 16:31:58.770277 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
2021-04-29 16:31:58.772809 [-] Open files limit: 1048576
2021-04-29 16:31:58.773253 [-] Can't bump open files limit
2021-04-29 16:31:58.808967 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2021-04-29 16:31:58.811577 [-] memory cache: disabled, private mode: enabled, js cross-domain access: disabled
2021-04-29 16:31:59.048778 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2021-04-29 16:31:59.049665 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled
2021-04-29 16:31:59.052154 [-] Site starting on 8050
2021-04-29 16:31:59.054569 [-] Starting factory <twisted.web.server.Site object at 0x7f3560b43550>
2021-04-29 16:31:59.055017 [-] Server listening on http://0.0.0.0:8050

Any thoughts or help would be greatly appreciated.

Rafiot commented 3 years ago

The error message [2021-04-29 16:40:11 +0000] [2729] [ERROR] Can't connect to ('0.0.0.0', 5100) means that you already have something running on that port.

If you have the webservice already running, poetry run stop will stop it, and then poetry run start should work fine. But it won't help if you already have something on that port. What happen if you do telnet localhost 5100? Or open http://localhost:5100 in your browser?

PolarBearGod commented 3 years ago

Ran through a full netstat but couldn't find anything tied to 5100. I opted to reboot my VMware host just to be on the safe side. Everything stays up now but now I have an internal investigation to figure out what had 5100 open to begin with.

Thank you so much for the help.

Rafiot commented 3 years ago

Weird, but glad it works now.