STFCcommunity / data

Food for bots, wikis, yournameits
Other
12 stars 13 forks source link

Sort levels from lowest to highest #31

Closed blackjack4494 closed 3 years ago

blackjack4494 commented 3 years ago

Uhm. I just noticed the data at least for Academy is not sorted from lowest to highest level. Is this the final version? Sorted data got huge advantages when we process it.

I previously worked with another json dataset which was sorted by level But I implemented advanced checks anyway that's why I even noticed

        # arrays starts at 0 so take offset into account
        if lvl > 0:
            lvl -= 1
        if name in data[_NAME_KEY]:
            # check if it's really the right level
            if data[_LEVELS_KEY][lvl]['level'] == lvl+1:
                return data[_LEVELS_KEY][lvl]
            else:
                if dbg:
                    print('[DEBUG] MISMATCH ERROR - Returned (%s, Level %s) but requested level %s' % (name, data[_LEVELS_KEY][lvl]['level'], lvl+1))
                raise ValueError('Level not found')

That's throwing me this error

PS D:\Workspace\vsc\lcars\datacollection\db> py .\extract_building.py
[DEBUG] MISMATCH ERROR - Returned (Academy, Level 8) but requested level 18
Traceback (most recent call last):
  File "D:\Workspace\vsc\lcars\datacollection\db\extract_building.py", line 279, in <module>
    print(get_building('Academy', 18))
  File "D:\Workspace\vsc\lcars\datacollection\db\extract_building.py", line 53, in get_building
    return _get_building(name, lvl, data, max_distance)
  File "D:\Workspace\vsc\lcars\datacollection\db\extract_building.py", line 95, in _get_building
    raise ValueError('Level not found')
ValueError: Level not found

If I want to return a single level I could use return _[_LEVELS_KEY][lvl] or return _[_LEVELS_KEY][_lower:_upper+1] for level ranges. But that's not possible the way it is.

If you want to tell me the data will stay in it's form I have two options:

  1. Sort data myself (easy solution but will require computation each time)
  2. Rewrite code to use for loops to gather all wanted levels

Honestly I don't like the 2nd solution.

Cheers.

Edit: Current workaround for me is sorting it a['levels'].sort(key=lambda x: (x.get('level')))

WebSpider commented 3 years ago

There is no guaranteed ordering in json as far as i know. I assume the lists are unsorted and use info = next(sh for sh in jsons['haystack'] if sh['property'].lower() == needle.lower())

blackjack4494 commented 3 years ago

There is no guaranteed ordering in json as far as i know. I assume the lists are unsorted and use info = next(sh for sh in jsons['haystack'] if sh['property'].lower() == needle.lower())

According to this RFC

An array is an ordered sequence of zero or more values

which levels are. And honestly I never came around ANY json that had arrays/lists scrambled like here.

That leaves me wondering if

  1. the initial data was not sorted?
  2. something scrambles the order for whatever reason.

Sorted Operations Json Original unsorted Operations Json Sorted Academy Json Original unsorted Academy Json

You can try to commit (and push) one of the sorted ones and see what happens. Something else that somehow annoys me is that there is no clear indentation in the original ones. They have 4 Spaces/1 Tab at the very start and then using 2 Spaces.

Just by sorting from lowest to highest level and following a strict 2 Spaces indentation the file size went down roughly 10% This should definitely be considered.

f = open('operations.json')
a = json.load(f)
f.close()

# workaround to make sure data is sorted by level
a['levels'].sort(key=lambda x: (x.get('level')))

json_data = json.dumps(a, indent=2)
json_file = open('operations_sorted.json', 'w')
json_file.write(json_data)
json_file.close()

This is the script I used for buildings.

walakymas commented 3 years ago

There is no guaranteed ordering in json as far as i know. I assume the lists are unsorted and use info = next(sh for sh in jsons['haystack'] if sh['property'].lower() == needle.lower())

List is ordered, only the order of dictionary elements are questionable.

blackjack4494 commented 3 years ago

There is no guaranteed ordering in json as far as i know. I assume the lists are unsorted and use info = next(sh for sh in jsons['haystack'] if sh['property'].lower() == needle.lower())

List is ordered, only the order of dictionary elements are questionable.

Yes that is never guaranteed nor does it matter since you always need (and should have) a key which is obvious since it's a key/value store. Dictionaries are meant to be predictable when it comes to their keys.

WebSpider commented 3 years ago
f = open('operations.json')
a = json.load(f)
f.close()

# workaround to make sure data is sorted by level
a['levels'].sort(key=lambda x: (x.get('level')))

json_data = json.dumps(a, indent=2)
json_file = open('operations_sorted.json', 'w')
json_file.write(json_data)
json_file.close()

I just did that for shipyard.json, and the difference is exactly 1 byte: The file ending character, which is added by pre-commit. Possibly you were looking at a file that somehow was not caught by formatters in pre-commit, we have set this to an indent of 2 spaces, which is enough to make it human-readable.

I do agree tho, that sorting lists adds a lot of convenience, but we should not trust they are sorted, since we currently do not enforce sequential list ordering in CI. Also, this would only work in a structure where we have a value inside that we can use to sort by (such as the 'level' key in the 'levels' list). It will not work for instance for the 'requirements' that are nested inside the 'level' list.

I'd happily accept a PR that sorts the lists nested, or a PR that suggests a completely different way to handle the problem,. I can think of various structures we can use to solve this

ccMatrix commented 3 years ago

The input is extremely unsorted. Basically they are "ordered" chronologically by when the info was requested from the old bot. Some things are a bit more sorted when they are all in one field which is processed in one run. But for other elements there is no sorting in the original data. It would of course make sense to sort them but until a validation can check if that is really the case anyone could commit a file where the order does not follow a strict sequence (like tier 1,2,3... or level 1,2,3...). So maybe the result of this ticket could be an extension to the pre-commit call which does validate (or even fix) the sorting that can be applied on all PRs.

blackjack4494 commented 3 years ago

The input is extremely unsorted. Basically they are "ordered" chronologically by when the info was requested from the old bot. Some things are a bit more sorted when they are all in one field which is processed in one run. But for other elements there is no sorting in the original data. It would of course make sense to sort them but until a validation can check if that is really the case anyone could commit a file where the order does not follow a strict sequence (like tier 1,2,3... or level 1,2,3...). So maybe the result of this ticket could be an extension to the pre-commit call which does validate (or even fix) the sorting that can be applied on all PRs.

You could run a simple test script like in the code in my starter post

# check if it's really the right level
if data[_LEVELS_KEY][lvl]['level'] == lvl+1:

The whole snippet to understand it better.
You can then simply deny the PR or automatically sort the data.
Whatever you think would be better. We can talk about it :)

ccMatrix commented 3 years ago

Having it as part of the pre-commit would basically do both. Sort the arrays and also block the PR. The sorting logic is a bit different for each type. Ships for example have 2 levels: tier and levels below that.