Closed apocalyptech closed 5 years ago
Actually, I suppose the following in the World
class might be somewhat useful. It's still rather inefficient to have to loop through all the keys in the world, if you're just looking for a single entity, but something like this should do:
@property
def uuid_to_region_map(self):
"""
A dict whose keys are the UUIDs (or just IDs, in some
cases) of entities, and whose values are the `(rx, ry)`
coordinates in which that entity can be found.
"""
if not self._uuid_to_region_map:
self._uuid_to_region_map = {}
for key in self.get_all_keys():
(layer, rx, ry) = struct.unpack('>BHH', key)
if layer == 4:
stream = io.BytesIO(self.get(layer, rx, ry))
num_entities = struct.unpack('B', stream.read(1))[0]
for i in range(num_entities):
strlen = struct.unpack('B', stream.read(1))[0]
uuid = stream.read(strlen).decode('utf-8')
if uuid in self._uuid_to_region_map:
# TODO: err? Should we do this?
raise Exception('UUID {} found in two regions'.format(uuid))
else:
self._uuid_to_region_map[uuid] = (rx, ry)
return self._uuid_to_region_map
Two things I'm unsure of with that:
utf-8
be used for decoding? I suspect that they never get beyond latin1
. If they're not latin1, does the length value indicate the number of bytes or the number of characters? (I'd assume the former, but I'd also assume that the question's moot 'cause probably the IDs never get beyond ASCII.)Anyway, something like that would at least enable apps to do something like:
for mark in bookmarks:
if mark.uuid in world.uuid_to_region_map:
entities = world.get_entities(*world.uuid_to_region_map[mark.uuid])
for entity in entities:
if ('uniqueId' in entity.data and entity.data['uniqueId'] == mark.uuid):
(ex, ey) = entity.data['tilePosition']
print(' * Found bookmark "{}" in {}, at coords ({}, {})'.format(
mark.name,
filename,
ex,
ey
))
else:
print(' * Bookmark "{}" not found in {}'.format(
mark.name,
filename,
))
I could PR that function if it's something you think you might want in the project, though I admit that even better would be a way to figure out those layer-3 lookups so we're not looping through all keys.
This is great! I’ll read through this properly later. Could I ask you to take a glance at the FORMATS.md
file to see if it needs some freshening up? It seems things have changed and you have a lot more up to date knowledge than I do and I would like there to be a central source of truth for this stuff to help others be able to work on Starbound files easier.
Regarding the unknown data that looks like floats to me. I tried one of them ("Home Base" in -452761926-908966428-252630074_5.world) and the coordinates appear to match up:
>>> struct.unpack('>ff', '\x45\x3E\xF0\x00\x44\x7F\x80\x00')
(3055.0, 1022.0)
Oh, hah! Yeah, of course they're floats. I've even seen plenty of coordinates as floats in the data elsewhere, too.
And sure, I've been thinking I should probably update the README as well with a couple of the new methods, though I'd wanted to make sure those settle down a bit first. I don't think there's anything in FORMATS that's wrong, though there's probably some stuff that could be expanded. I'll take a look in a bit!
So, I'm finally getting back to this stuff a bit. The above uuid_to_region_map
property has been working quite well for my own purposes; still not sure if it's something that necessarily belongs in the base class or not.
I've got some other info that I'm aggregating into my own World
class. Perhaps some of this stuff might be useful to have in the real class? Let me know if so and I can put together a PR for it:
class World(starbound.World):
"""
Overloaded World class, originally to provide some data-access methods
I'm not sure really belong in the main py-starbound World class,
but now also to provide some of our own data correlation stuff which
almost certainly *doesn't* belong in the py-starbound World class.
"""
def __init__(self, df, filename):
super().__init__(df)
self._uuid_to_region_map = None
self.filename = filename
self.base_filename = os.path.basename(filename)
self.types = set()
self.biomes = set()
self.dungeons = set()
self.coords = (0, 0)
def read_metadata(self):
"""
Reads metadata, and collates some information in the metadata
structure, for ease of use later.
"""
super().read_metadata()
if 'worldTemplate' in self.metadata:
wt = self.metadata['worldTemplate']
if wt:
if 'celestialParameters' in wt:
cp = wt['celestialParameters']
if cp and 'parameters' in cp and 'terrestrialType' in cp['parameters']:
self.types = set(cp['parameters']['terrestrialType'])
if cp and 'coordinate' in cp and 'location' in cp['coordinate']:
self.coords = (
cp['coordinate']['location'][0],
cp['coordinate']['location'][1],
)
if 'worldParameters' in wt:
wp = wt['worldParameters']
if wp:
for key, layer in wp.items():
if layer and (key.endswith('Layer') or key.endswith('Layers')):
if key.endswith('Layer'):
layerlist = [layer]
else:
layerlist = layer
for layer in layerlist:
for dungeon in layer['dungeons']:
self.dungeons.add(dungeon)
for label in ['primaryRegion', 'primarySubRegion']:
region = layer[label]
self.biomes.add(region['biome'])
for label in ['secondaryRegions', 'secondarySubRegions']:
for inner_region in layer[label]:
self.biomes.add(inner_region['biome'])
Cool! I would say that the O(1) things are fine to add if they're useful in a general sense.
However, anything else I would prefer if it was only calculated on an as-needed basis. Basically as a rule of thumb to avoid a tool that e.g., displays a list of worlds, from slowing down as we add more and more conveniences. Same goes for individual attributes. One tool might only care about the types, and another may only care about dungeons, and in those cases if we don't have a sane separation of concerns, things will get slower as more useful data fields get added.
It might also be worth considering putting some of these into a WorldInspector
class instead of using inheritance. And then try to split out reading for specific values, but still grouping relevant loops to avoid unnecessary overhead when you do need all values:
class WorldInspector(object):
def __init__(self, world):
self.world = world
self._biomes = None
self._coords = None
self._dungeons = None
self._types = None
@property
def biomes(self):
if self._biomes is not None:
return self._biomes
self._read_world_parameters()
return self._biomes
@property
def coords(self):
# Same but self._read_celestial_parameters()
# ...
def _read_world_parameters(self):
self.world.read_metadata()
if 'worldTemplate' in self.world.metadata:
self._biomes = set()
self._dungeons = set()
# Update _biomes and _dungeons here since they're in the same pass...
# And same for _read_celestial_parameters...
I prefer this since you're not really expanding/modifying the behavior of World
, only creating a new type of utility.
Sure thing, I'll get that wrapped into a PR in a bit, then!
So this feels a bit like overkill, really; I feel vaguely like nearly all of this would be perfectly fine happening automatically instead of on-demand, but here's a version where literally everything's on-demand, regardless: https://github.com/apocalyptech/py-starbound/commit/d3cf7bd04f84705fae86b719b788db2871f74e76
Look okay? As I say, seems a bit overkill, but it works well. IMO it makes sense to shove in the introspection class to the world object, rather than forcing users to keep track of it themselves.
If you're okay with this approach, I'll add in some documentation for all that and get it PR'd properly! :)
Hey @apocalyptech I agree it looks a bit overkill, but I actually think this is a great start that can be done very similarly without the overkill part :) I took a stab at it here: https://github.com/blixt/py-starbound/compare/world_info (I don't have Starbound installed so I couldn't test that it all works fine, sorry!)
Let me know what you think. Some notes:
First, to avoid circular references I made WorldInfo
only care about the metadata dict
, so it no longer keeps a reference to the World
. This also showed that get_entity_uuid_coords
doesn't belong in the separate WorldInfo
class (because it's actively inspecting the BTree), so I moved that back into World
.
Overall, since I foresee more things getting added to WorldInfo
, I created a simple lazyproperty
descriptor to avoid maintenance cost of keeping __init__
and the_property
in sync (this also makes the code more readable by reducing if
nesting).
Then I made the celestialParameters
and worldParameters
private lazy properties that return a named tuple with all the information they represent. This makes extending them and also accessing them from other properties much easier.
Finally I did some PEP8 cleanup and potential bug fixes. I noticed you were reading a single byte, and then a stream of bytes and decoding it as UTF-8. I presume this is actually a standard "SBON" string, which means that first byte is actually a variable length integer, which would mean it could break in some cases if we assume it's only a single byte. I also assumed that the number of entities is also a variable length integer instead of a single byte. Let me know if this is actually wrong and breaks in your testing!
Hello! Yeah, that looks good, though requires a bit of tweaking. As-written, the lazyproperty
class can't handle having more than one World
object -- or rather, as soon as a property's been read from a single World's metadata, it'll continue returning that data for any other World as well. To solve that, the lazyproperty
class would either have to store its info using an internal dict of some sort, or just load attributes into the proper object itself. I opted for the latter since that way the data for a World gets kept inside the object itself. Feels a bit hacky but should be fine; used the attribute name _lazyproperty_<funcname>
to store 'em.
The other problem I encountered was that _celestialParameters
and _worldParameters
were attempting to alter the namedtuples, which was failing 'cause of tuple immutability. Reworked those very slightly to compensate.
Both those changes can be seen here: https://github.com/blixt/py-starbound/compare/world_info...apocalyptech:world_info?expand=1
Good catch re: the varint stuff, I'm sure you're right about that. No errors reading any of my own world files, so I think we're good there.
Ah of course, how silly of me :/ That's what I get for not testing my code and you can tell I haven't touched Python for a while now! Thanks :)
If you're happy with this version then go ahead and squash it and PR it, otherwise I'm happy to do it too.
BTW for future commits, a few notes on code style that I'd like to try and follow in this repo:
Sounds good! I'll get some docs added as well, and bundle that into a PR once that's ready.
Also sounds good re: style -- I admit I tend to be a bit laissez-faire with my own commits. :)
I'm putting this one in here more for documentation and brainstorming than anything else, in case someone else feels like puzzling this out. I've got it Good Enough for my own purposes, and was planning on just implementing what I've got in my own app. It'd be nice to have a "real" API for this, though whether it's worth the work is another question.
Anyway, I wanted to find out how to find user bookmarks, though this seems to also be a method to (probably) find where any Entity is, based on UUID (though I haven't really looked into that too far). The starting point for bookmarks, at least, is in the player JSON data, underneath
universeMap.(uuid).teleportBookmarks
, and includes the following data:The colon-separated string contains one of:
CelestialWorld
entries point to theuniverse/*.world
files, as you'd expect,InstanceWorld
entries point touniverse/unique-<filename_part>-<instance_uuid>-<suffix_num>.world
files, andClientShipWorld
points toplayer/<player_uuid>.shipworld
.Notably absent from that info is the actual world coordinates for the bookmark - for that we need to look at the map data itself. One brute-force way would be to loop through all regions,
get_entities
on them, and look for entities whoseuniqueId
attribute matches the bookmark UUID (and then use that entity'stilePosition
).To avoid having to loop through all entities in the map, it looks like the map data types/layers 3 and 4 could be used. Layer 4 is the simplest, and seems to be a simple map of all entity UUIDs in a region, so you can get those with just
world.get(4, rx, ry)
, though of course finding that would require looping through all keys, since you don't know therx, ry
yet. The data itself in there starts out with a single unsigned byte which is the number of IDs included in the data, and then a length-prefixed string for each ID (the length is another single unsigned byte - ordinarily0x20
/32 for UUIDs).Layer 3 is a bit more complex, and I suspect that this is the one which is intended to be used for this sort of lookup. My first big unknown is that I have no idea what the key actually represents. The layer/type is 4, of course, but I've yet to figure out what in the world the other eight bytes are meant to describe. It's definitely not just
rx, ry
, and may not even be considered two shorts.As for the data inside layer 3 nodes, it starts off with an unsigned short specifying the number of entries. Each entry starts out with a length-prefixed string, as with the layer 4 entry. Then there's two shorts which give you the region X, Y, and a further eight bytes I don't have a clue how to interpret.
So right now, for both of those layers, the best I can manage is looping through all the keys in the DB, matching on the UUID, and then using the region X,Y to grab entities in that region, and match on the UUID in there as well. I assume there's got to be a way to generate the keys for region 3 data using what exists in the Player bookmark data, but I've not figured that out yet. Among the things I tried was checking to see if the key happened to be a crc32 of the UUID, but no such luck there.
I know this is already super long-winded, but for thoroughness's sake, here's various bookmarks and their layer 3 keys/unknown data: