Open jonas4climate opened 3 years ago
Hi Jonas, i had a quick look at the problem and first of all fixed the many issues with regards to old libraries and installation issues. That part is fixed for now, so I was able to exactly reproduce your error. Will take a closer look at it in the next couple of days.
I guess that this is actually a data issue in the MUSCIMA++ dataset. If you debug the application, you should be able to track down the sample that is causing this problem. Feel free to also create a ticket in this repo: https://github.com/OMR-Research/muscima-pp if you can find the culprit.
Hello Alexander,
Thank you so much for your quick response and for looking into this. I will try to find out which data points where causing the issue and then create a ticket.
I did some debugging in the data_pool.py
code to find the underlying problem files which are the following file pairs (see below for details). If this is not a loading issue but these are invalid bounding boxes, I can create a ticket for these in the dataset repo as you mentioned. Would fixing this take long and do you possibly have an idea how to hot-fix this on my side for now i.e. using only valid data? I noticed strange shapes of the image masks it attempted to use so I am assuming these are bounding box issues in the dataset. Further investigating I noticed that actually all issues satisfy this conditiont < 0 or b > image.shape[0] or l < 0 or r > image.shape[1]
meaning that they must all be out of bounds for the 2D image array.
However there is quite a lot of them, the for loop loading the bounding boxes for each vertex in each MuNG had 2,223 exceptions and 60,353 valid executions so about 3.5% of labels were invalid? This appears to be a rather large number.
Issue 2028 (invalid bounding box): Failed setting mask [1984:2048, 912:919], img size (1306, 3456), res mask shape (0, 7)
data/mungs/CVC-MUSCIMA_W-13_N-16_D-ideal.xml - data/images/CVC-MUSCIMA_W-19_N-19_D-ideal.png
Issue 2132 (invalid bounding box): Failed setting mask [1291:1314, 2799:2883], img size (1306, 3456), res mask shape (15, 84)
data/mungs/CVC-MUSCIMA_W-13_N-16_D-ideal.xml - data/images/CVC-MUSCIMA_W-19_N-19_D-ideal.png
Issue 2222 (invalid bounding box): Failed setting mask [1943:2061, 207:3337], img size (1306, 3456), res mask shape (0, 3130)
data/mungs/CVC-MUSCIMA_W-13_N-16_D-ideal.xml - data/images/CVC-MUSCIMA_W-19_N-19_D-ideal.png
{ ('data/images/CVC-MUSCIMA_W-19_N-04_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-38_N-18_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-12_N-04_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-48_N-02_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-42_N-08_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-39_N-12_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-31_N-07_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-09_N-13_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-17_N-01_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-46_N-20_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-09_N-06_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-24_N-01_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-08_N-15_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-25_N-12_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-09_N-13_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-04_N-20_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-02_N-13_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-27_N-16_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-19_N-19_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-13_N-16_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-22_N-15_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-09_N-06_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-32_N-09_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-08_N-10_D-ideal.xml'), ('data/images/CVC-MUSCIMA_W-02_N-06_D-ideal.png', 'data/mungs/CVC-MUSCIMA_W-27_N-03_D-ideal.xml') }
data_pool.py
def __load_munglinker_data(mung_root: str, images_root: str,
include_names: List[str] = None,
max_items: int = None,
exclude_classes=None,
masks_to_bounding_boxes=False):
if exclude_classes is None:
exclude_classes = {}
all_mung_files = glob(mung_root + "/**/*.xml", recursive=True)
mung_files_in_this_split = [f for f in all_mung_files if os.path.splitext(os.path.basename(f))[0] in include_names]
all_image_files = glob(images_root + "/**/*.png", recursive=True)
image_files_in_this_split = [f for f in all_image_files if
os.path.splitext(os.path.basename(f))[0] in include_names]
mungs = []
images = []
n_faulty = 0
n_valid = 0
faulty_set = set()
for mung_file, image_file in zip(mung_files_in_this_split, image_files_in_this_split):
mung = __load_mung(mung_file, exclude_classes)
mungs.append(mung)
image = __load_image(image_file)
images.append(image)
# This is for training on bounding boxes,
# which needs to be done in order to then process
# R-CNN detection outputs with Munglinker trained on ground truth
if masks_to_bounding_boxes:
for mungo in mung.vertices:
t, l, b, r = mungo.bounding_box
try:
image_mask = image[t:b, l:r]
mungo.set_mask(image_mask)
n_valid += 1
except ValueError:
faulty_set.add((image_file, mung_file))
if (t < 0 or b > image.shape[0] or l < 0 or r > image.shape[1]):
print(f'Issue {n_faulty} (invalid bounding box): Failed setting mask [{t}:{b}, {l}:{r}], img size {image.shape}, res mask shape {image_mask.shape}\n{mung_file} - {image_file}\n\n')
else:
print(f'Issue {n_faulty} (unknown)')
n_faulty += 1
if max_items is not None:
if len(mungs) >= max_items:
break
print(f'\nProblems with annotations for file pairs: {faulty_set}\n')
print(f'Unproblematic iterations: {n_valid}')
print(f'Problematic iterations: {n_faulty}\n\n')
return mungs, images
I am currently working on my final year project further exploring Machine Learning applied to the notation assembly process and have encountered an issue when attempting to run the
train.py
file from your project. The downloading of the data did not work with the script by default and I had to modify therequirements.txt
file to include all dependencies needed but then did get it to run. However at 13% into the loading process I now get this issue:Here the requirements.txt if this has possibly any origin here:
Do you have any idea where the issue could originate or if this is a simple fix?
Thank you so much.