Improve named meta format

covercash2 commented 5 years ago

sorry, i'm the author of the named packer, but it's not working. because the packer changes the order of the sprites and doesn't retain information about how that order is changed, i had to go in and add fields to a lot of intermediate data types, which really messes with the original architecture but works for my purposes. i basically attached the file name to every intermediate data type. you can look at my quick and dirty changes here.

i didn't want to open a pull request yet, since, like i said, it's pretty dirty and pokes some holes in the original architecture. i just wanted to get some thoughts on how to change this approach so it could work.

happenslol commented 5 years ago

Sorry for leaving this open for so long - getting a proper packing algorithm in place was my first priority and I didn't have much time for this project anyways, so I focused on that when I did.

However, now I think it would be great to have a different format, since the bigger the sheets become, you don't really want to look through the meta files to find out what index the sprite you want to use landed at. So, I'll be looking into this soon, and see how we can adapt your fixes to the new architecture.

It might make sense to hash the images, and add all filenames belonging to a hash to an entry in a hashmap - that way, we would get aliasing of identical sprites for free since the image for each hash would only be included once, but potentially be referenced by multiple filenames and thus meta entries.

basil-cow commented 5 years ago

I think I'll try to tackle this as well as aliasing same textures both are using hashing

basil-cow commented 5 years ago

@covercash2 isn't sprite anchors id the position of the sprite in the input vector (and position of it's name in the names vector)?

happenslol commented 5 years ago

Yes, the position in the input is used as an id for now since it's always unique. A hash of the image would be better here, as already discussed here and in #12. Happy to provide any other pointers if you need them!

basil-cow commented 5 years ago

Do we want to give users ability to find a sprite in spritesheets? I don't see why would you want that, user already can find their sprite (and all other info, like path) given an anchor. Presumably users don't care in what order they get to serialize sprites, right? I just don't think there is a need in hashing besides aliasing.

happenslol commented 5 years ago

What do you mean by "find a sprite in spritesheets"? I see 2 cases where the user would need some kind of identifier:

When using the library directly, the user needs to relate the anchors he receives as a result back to his sprites. We can use any id here, as long as it's the same in the input and the output. (I would probably prefer a hash since we don't lose much by doing that, and it's easier to debug and reason about.)
When the user uses the CLI, and needs to refer back to his sprites in the meta file. This is where we use the filename. Since multiple filenames can point to identical images, we would still preserve these filenames, and simply point them to the same image (since the user can't know which one we'll get rid of if we decide to leave one out). For this purpose, it makes sense to build a map of hash -> filename, which can then later be used to create the meta file.

The order shouldn't (and can't) matter in any case, since we're free to change the order at any point for optimization purposes. Offline packing is out of scope for now, the user hands us a bunch of files at once and we hand back a corresponding amount of anchors at once.

basil-cow commented 5 years ago

Why can't we say that id is the index of sprite in the input vector and that won't change? That way, user can find the sprite corresponding to an anchor as well as all the info user needs (filenames and such). As for aliasing, we just create several anchors with same contents but different id's (we use hashing to find aliases, of course, but don't expose anything related to hashing to the user).

A small drawback is that user won't know if there are any aliases, but I don't think anyone would need that for formatting.

basil-cow commented 5 years ago

I made an implementation attempt at https://github.com/Areredify/sheep/tree/hashing, you can check it out

happenslol commented 5 years ago

Generally, I'd be against it because it adds another layer of abstraction (hash -> index -> data instead of hash -> data) and won't bring us much performance wise. If your implementation cleanly deals with that and/or there are performance issues with the hash-only variant, I'm open for that of course.

I'll get back to you when I've had the time to look at your implementation more closely.

happenslol commented 5 years ago

@Areredify Alright, I've had a chance to look at it now.

Looks good all in all, however I think the way you've implemented the aliasing is a little hard to read, imo. I think by just maintaining a list of the sprites and then iterating over the hashes, you make the code a whole lot simpler:

let mut hashes: HashMap<&[u8], Vec<usize>, BuildHasherDefault<XxHash64>> = Default::default();

for (id, sprite) in input.iter().enumerate() {
  let mut ids = hashes.entry(sprite.bytes.as_slice()).or_insert(Vec::new());
  ids.push(id);
}

// ... iterate over hashes here, no need to filter antyhing

I do see the problem (which you were trying to avoid with your implementation) of having a whole lot of 1-element vecs in the optimal path, which is that there are no aliases. I think this could easily be avoided by using SmallVec, which is essentially a more opaque implementation of what you were doing before.

I do have one more idea though, concerning the "extra abstraction layer" thing I mentioned before: I feel like letting the user pass in a generic ID as part of the InputSprite struct would be best. This is what it would look like:

struct InputSprite<I = usize> {
  pub id: I,
  pub bytes: Vec<u8>,
  pub dimensions: (u32, u32),
}

The user could the either pass in IDs, filenames or anything else and we would simply use these as the unique identifier for each sprite throughout the process. This allows us to stay with usize for the internal usecase, while allowing maximum flexibility for consumers of the lib. What do you think about that?

Edit: And you can definitely already open a PR for this, that would make it easier to comment and iterate on the code =)

basil-cow commented 5 years ago

As for the first part, your idea is much cleaner than mine, not sure why I didn't think of that, I'll open a PR once I'll implement it. For the second part, I am not sure what do you mean exactly, can you provide an example how would a named packer, for example, look like?

basil-cow commented 5 years ago

I figured out what you are trying to say, I don't really like having a HashMap where there a Vec could suffice, but I guess there is no real performance tradeoff, and usability would be better.

basil-cow commented 5 years ago

I think by just maintaining a list of the sprites and then iterating over the hashes, you make the code a whole lot simpler:

I now found out there are two issues with this. First issue, that way you can't move the sprites out of input, you need to copy them. Second issue, you actually need a way to get all the aliases for the sprite later, so you need something like let aliases: Vec<Vec<usize>>

I also tried to remove the "extra abstraction layer", it turns out you need to sprinkle just about every function in every file with <I> to be generic over index. Doesn't really make sense why would we need to be generic over an index in a function that compares rectangle areas, for example. I think a better solution is to just pass indexes to output anchors and not use them internally at all.

happenslol commented 5 years ago

Let me sleep on this - using Vec indices just feels brittle to me. I'll comment on the PR next.

happenslol commented 5 years ago

Fixed in #22!

covercash2 commented 5 years ago

thanks for fixing my mistakes guys! i've been outta the loop apparently.

amethyst / sheep

Improve named meta format #5