Closed chris-counteractive closed 3 years ago
Just pushed #761 which implements a new commandline option "import" must be run on the server at the moment:
export VELOCIRAPTOR_CONFIG=....
velociraptor import --hostname XXX Collection-DESKTOP-25CK4TB-2020-11-20_00_34_23_-0800_PST.zip --create
Will create a new host with hostname XXX or use the first client with that hostname and upload the offline collection to it. You should be able to see the collection with a new flow id in the normal flows menu.
Please give it a try and see if it works for your use case.
Works like a charm, sir, thank you very much, it gets the job done! I like the option to specify either a client ID or a hostname, and I like that it'll work with "legacy" collections (i.e., doesn't require a particular artifact having been in the bundle). This definitely meets our basic need - if we're working a response or hunt, and there are systems that have been pulled offline for containment, say, we can collect offline and roll them into the overall data without separate procedures for the collected zips. Marvelous.
Some food for thought as as you decide how/whether to keep enhancing this:
.\Collection-myhostname-2020-11-20_10_15_18_-0600_CST.zip
), so it's straightforward to get it that way, but that feels a bit brittle - the user might rename the file, etc. Not sure it's worth the extra effort (see above - this works fine :)), but you could imagine pulling this from the collection itself if available (e.g., "if you see Generic.Client.Info
, use the data from that") or even pulling it from the filename (e.g., "if the filename matches this regex, treat this capture group as the hostname"), if the --hostname
flag isn't present. (edit: this also gives some protection against fat-finger typos - which are sometimes just annoying, but sometimes can collide with another existing host, e.g., when folks use hostnames with serial numbers or incrementing IDs)Created
timestamp). It also aligns with the idea of making these collections as indistinguishable from "normal" collections as possible.Thanks again, I'm always knocked out by your responsiveness.
The offline collections don't seem to be viewable at the moment for me. When I click on the flow id, it just comes up blank.
@clayscode it seems that it failed to recognize any artifacts in your Zip file (artifacts column is empty) - did the zip file contain custom artifacts? was the importing binary run with the server config (so it can find the custom artifact definition?).
@chris-counteractive Velociraptor usually treats the hostname as an identifier that is being indexed so we can search on it easily to find the client id - so it doesn't necessarily have to be an actual dns name. You can always specify --hostname "Bobs Machine"
just as long as you can search for it later. The client id just collects related collections to what we believe is a unique machine - but it maybe in fact that two client ids are really the same box - but we have no way to know that.
It depends how you want to manage the collection - but Velociraptor does not really make any assumptions that collections are related to each other - that is open for interpretation. So it might not make a lot of difference if the real client id is the same as the offline client id. I think the main advantage of this feature is to be able to use the notebook to post process collected results so we don't have to resort to external json manipulation tools like jq or miller. So for example, we can collect the MFT, post process it and filter it, remove some columns and re-export a smaller csv file of relevant data simply by importing the offline collection, accessing the flow notebook (Which is created automatically for every flow collection) and then just adding some WHERE filters and columns specifications (or use the GUI to show/hide columns) then click the export to csv button.
Perhaps a related but different feature is to merge two client ids together - so if it becomes apparent later that the clients are somehow the same, we can just merge them to the same client id to avoid confusion.
As for the last active metadata - it is not filled because we have no idea when the collection was actually taken. I didn't want to add a hard requirement for a metadata file to be added to the collection but maybe make that optional? this way the high level metadata can be added at collection time and just extracted at import time if possible.
What information do you think we would like to add? I was thinking of:
From a design POV we can simply create an additional artifact that will be collected automatically with every offline collection - then the importer can just look for that artifact and use it to popular the flow data.
Does the server need to know about the custom artifact definitions? Assumed it just grabbed the filename or something and used that as the artifact. The collection is a nested artifact with the top level artifact as the folder name and the sub artifacts as the JSON files (e.g. Custom.Example.Artifact/Custom.SampleArtifact.json). If I need to rename things/put artifacts in their own folders I can do that.
Currently there is no structure in the offline collector zip - we just have a bunch of files, some can be artifact result sets and some can be uploaded files. The importer uses the name to figure out if the file is an upload or an artifact result - so this is why it needs to recognize the names.
We generally want to have a link between the result set json and the artifact that generated it - the GUI can use it for annotating column types (like timestamps etc). So we probably dont want to blindly load result sets without knowing the artifact that generated them.
Is the issue that you are tying to import a collection produced by a different installation of Velociraptor without the custom artifacts? Or do you have trouble importing custom artifacts that should be recognized?
So I've added my custom artifacts to the velociraptor server but it's still not recognizing the offline collection. I imagine the issue is that my artifact looks like this:
sources:
- name: Example
queries:
- SELECT *,uuid() AS UUID FROM Artifact.Custom.Example()
Thats interesting - the artifact you describe has a named source which would change the way it has been written to the zip. I tested by creating a collector collecting Generic.Client.Info which also has named sources. The produced zip file contains the following
$ unzip -l /shared/Collection-DESKTOP-25CK4TB.localdomain-2020-11-23_06_58_13_-0800_PST.zip
Archive: /shared/Collection-DESKTOP-25CK4TB.localdomain-2020-11-23_06_58_13_-0800_PST.zip
Length Date Time Name
--------- ---------- ----- ----
313 1980-00-00 00:00 Generic.Client.Info/BasicInformation.json
1054 1980-00-00 00:00 Generic.Client.Info/Users.json
--------- -------
1367 2 files
and importing it works correctly.
I then customized the artifact to create Custom.Generic.Client.Info repeated the process and it worked as well
I imported the collection using
$ ./output/velociraptor-v0.5.2-linux-amd64 --config ~/server.config.yaml import --hostname XXX /shared/Collection-DESKTOP-25CK4TB.localdomain-2020-11-23_07_04_42_-0800_PST.
zip
Importing zip file /shared/Collection-DESKTOP-25CK4TB.localdomain-2020-11-23_07_04_42_-0800_PST.zip into client id C.9ca8a7c498264e89
Filename Custom.Generic.Client.Info/BasicInformation.json
Filename Custom.Generic.Client.Info/Users.json
Can you please attach the output of the import command?
@scudette thanks for the follow-up! I totally understand velociraptor's not making any guarantees with respect to hostname (just treats it as a label) or even client ID, and I'm completely content with the functionality as it stands - using it for unified notebook analysis is certainly a key driver. It's also nice in our use-case to be able to go to the velociraptor server as the central store for collected "raw" artifacts (e.g., from KapeTriage
) and this new import feature will keep that much more consistent and organized.
It perhaps wasn't clear in my edited comment, but the "automated pulling of hostname" idea was less about changing the semantics or guarantees of hostnames in VR, more about avoiding inadvertent collisions when doing manual entry. We've had cases where systems are named with the last few digits of their service tag number, for example, and if there's both a financehost0105
and a financehost0IO5
, pulling it from the filename or an optional collected artifact would help avoid accidentally associating one host's offline collector results with the other.
Merging does seem to be a feature that would render most of these decisions lower-impact: if you can ex post facto decide to artbitrarily combine various collections, that empowers the user to solve a lot of the edge cases. Probably a more significant undertaking though, not sure the juice is worth the squeeze, but it would be nice.
I didn't want to add a hard requirement for a metadata file to be added to the collection but maybe make that optional? this way the high level metadata can be added at collection time and just extracted at import time if possible.
Totally sensible, I like keeping it backwards compatible with previous offline collections. But if the data's there, it'd be nice to use it 😃
From a design POV we can simply create an additional artifact that will be collected automatically with every offline collection - then the importer can just look for that artifact and use it to popular the flow data.
Absolutely, yes sir - that's what I had in mind in the notional story up top of this thread. I think Generic.Client.Info
already gets you most of the way there - it has most of the items you suggest, though not the collection time metadata.
Speaking of collection time metadata, I was reviewing some previous test imports and I noticed some wacky times in the logs (note the date, 52858-08-08):
I can open a separate issue for that if you like. Thanks again!
Hmm no dice. Will try the latest CI build.
velociraptor --config /etc/velociraptor/server.config.yaml import --hostname TestHost Collection.zip
Importing zip file Collection.zip into client id C.a7297848191d149b
Filename Custom.Test/ClientInfoA.json
Copying file Custom.Test/ClientInfoA.json -> /clients/C.a7297848191d149b/collections/F.BUUI1M6H0HGEM/uploads/file/Custom.Test/ClientInfoA.json
Filename Custom.Test/ClientInfoB.json
Copying file Custom.Test/ClientInfoB.json -> /clients/C.a7297848191d149b/collections/F.BUUI1M6H0HGEM/uploads/file/Custom.Test/ClientInfoB.json
Ah, it's working now after I created the artifact on the server. My other collection refuses to import though, even though I imported my artifacts to the server...
Interesting, after recreating the artifacts in the GUI instead of just importing them like /usr/bin/velociraptor --config /etc/velociraptor/server.config.yaml frontend -v --definitions=/artifacts
it recognizes my artifacts now.
Ah that makes sense - if you keep you artifact definitions in another directory you will need to also load that one during importing the zip file (it is the importing process that needs to learn about all the definitions). In that case you just need to also specify the --definitions flag to the import command.
Hmm, that still doesn't seem to be working. Same issue with it coming up blank in the server even though I'm specifying my definitions folder on import.
Ah good point - thanks for testing it. It should work in the latest CI build (and will be in 0.5.3)
I am going to close this issue since the basic capability is there- please open a new issue if we need to improve it more.
Currently results created using offline (or "stand-alone") collectors cannot be integrated with results in an existing server datastore. For certain use-cases it would be very useful to treat these results as "first-class" within the GUI, opening them up to analysis (e.g., with notebooks) and reporting and export, the same as their online peers.
put another way, treat offline-collected clients the same as the rest, just with a very high latency connection.
notional story
There are design decisions involved, and edge cases to be solved, but might be within reach after a brief discussion on discord. Consider the following possible workflow as a start to the conversation:
Generic.Client.Info
with an additional optional parameter,ClientID
ClientID
is unspecified (the default), a new, unique Client ID will be generated as if this were a "normal" online client, and will be used on import to create the requisite metadata to display it in the GUI as if it were an online client.ClientID
is specified, the results will be merged with the existing client on import.ClientID
from among existing clients (e.g., by hostname).velociraptor import --datastore /path/to/datastore --collector-results /path/to/results.zip
(edit: or just specify a server config rather than datastore location, whatever is most consistent) ... with corresponding options for specifying cloud collector locations, oredge cases
collectorclient already. repurpose the "running system" ID? do nothing outside the normal workflow ("it's just another process ...")?Thanks for the discussion!