Any suggestions for speeding up the pattern load onto DMD

zhaoguangyuan123 commented 3 years ago

I tested the effeciency of loading a single image with size 1920*1080 onto the DMD. It is fast in all the other parts while it takes 15s to finish the encode function and 3s to finish the bitload function.

I figure out part of the reason is the for loop and part of the reason is freuqently loading single number onto DMD with the command function.

Does anyone have figured out more efficient way of doing this process? Thanks a lot.

devWork1 commented 3 years ago

How many images are you attempting to load and what type and format are they in?

Typically I have found to speeds to be reasonable. I have however run into the problem of the actual displayed images doing a weird subtract and addition from one image to the next. I think it has to do with the it trying to save speed by only displaying a change from the previous image but keeping everything that did not. However, I need to display each image in the list in its entirety. Did you run into this? I'm using bitmaps opened as numpy.asarrays.

e841018 commented 3 years ago

I have an accelerated version of the encoding function, and here's the repository.

Simply substituting the encode function should work, I think.

If there's any problems feel free to open issues. I'm working on something else and may not have time to fix it until December though.

ppozzi commented 3 years ago

@e841018 That's awesome, how much faster does your version of the code run? If i can find the time, am i allowed to include your function in the main repository? (of course with acknowledgements)

@devWork1 From the sound of it, it seems you are loading wrong images (i may be wrong). Please be aware that the library only works with binary patterns, so it requires the input arrays to only contain ones and zeros. If you simply load any bitmap, it will probably contain other numbers, and i frankly don't know how the code will react (yes, i should have a check and raise an error, sorry about that).

e841018 commented 3 years ago

@ppozzi I don't remember the exact time, but I think it is at least as fast as the reference C++ code given by TI. The actual runtime depends on the image and processor (supports AVX2 or not etc.). Generally finishes in less than 500 ms.

Of course please include my code if you have time. I'm glad to let more people use it.

zhaoguangyuan123 commented 3 years ago

Yeah I have tested the 'encoder' offered by Yu @e841018 The speed of the new encoder function is super-fast. It takes only 0.1s to finishing encoding with my mac I9 cpu. I will test it on with DMD and see the real result on the DMD. Thanks a lot, Yu! @e841018

Moroever, how you guys select the driver from zagdig [https://zadig.akeo.ie/] Do you choose HidUsb (v6.1.7601.24386) , libusb-win32 (v1.2.6.0). or WinUSB (v6.1.7600.16385)? I am a bit confused about this now. @e841018 @ppozzi @devWork1

zhaoguangyuan123 commented 3 years ago

How many images are you attempting to load and what type and format are they in?

Typically I have found to speeds to be reasonable. I have however run into the problem of the actual displayed images doing a weird subtract and addition from one image to the next. I think it has to do with the it trying to save speed by only displaying a change from the previous image but keeping everything that did not. However, I need to display each image in the list in its entirety. Did you run into this? I'm using bitmaps opened as numpy.asarrays.

Hi, in my application i need to continuously upload image onto DMD. So the speed is quite important.

ppozzi commented 3 years ago

@zhaoguangyuan123 @e841018 Awesome! Then i will, as soon as i have time, integrate your encoder in the main library. I should be able to test it on my own, as running just that function should not require access to a working DMD, i'll just make sure the outputs of the two functions are the same to be sure i'm using it correctly.

@zhaoguangyuan123 unfortunately, for as fast as you can run the encoder, you will probably still be limited by the darn usb1.1 connection to the devkit for upload. I don't think it's a solvable problem, unfortunately. You may try to use the video interface to at least update at 60 Hz, but beware of synchronization, i tried it back in the day, and even if you keep sending the same image, the output is extremely flickery, so you would need to somehow synchronize your detector with the refresh rate of the DMD.

zhaoguangyuan123 commented 3 years ago

@ppozzi @e841018 The code Yu shared works very well. It can speed up nearly 40 folds.

kevofwg commented 2 years ago

Hi, I had a question about the Pycrafter 6500 Encoder has it been updated recently and is it ready to be used? We would gladly test it out.

taladjidi commented 2 years ago

Hi, I included @e841018 's great function in the main code such that the functionality is exactly the same as previously. I can't create a PR. Would it be possible to include it in the main branch ? Cheers. Ps : Sorry for sharing code in such a nasty way :)

import usb.core
import usb.util
import numpy as np
import sys
import struct

"""
Many thanks to Ashu for the great encoding function
https://github.com/e841018/ERLE
I merely wrapped it into the original code ...
@author Tangui Aladjidi
"""

pack32be = struct.Struct('>I').pack  # uint32 big endian

# function that converts a number into a bit string of given length
def convlen(a, length):
    b = bin(a)[2:]
    padding = length-len(b)
    b = '0'*padding+b
    return b

# function that converts a bit string into a given number of bytes

def bitstobytes(a):
    bytelist = []
    if len(a) % 8 != 0:
        padding = 8 - len(a) % 8
        a = '0'*padding+a
    for i in range(len(a)//8):
        bytelist.append(int(a[8*i:8*(i+1)], 2))
    bytelist.reverse()
    return bytelist

# function that encodes a 8 bit np array matrix as a enhanced run lenght
# encoded string of bits
def get_header():
    '''
    generate header defined in section 2.4.2
    '''
    header = bytearray(0)
    # signature
    header += bytearray([0x53, 0x70, 0x6c, 0x64])
    # width
    header += bytearray([1920 % 256, 1920//256])
    # height
    header += bytearray([1080 % 256, 1080//256])
    # number of bytes, will be overwritten later
    header += bytearray(4)
    # reserved
    header += bytearray([0xff]*8)
    # background color (BB GG RR 00)
    header += bytearray(4)
    # reserved
    header.append(0)
    # compression, 0=Uncompressed, 1=RLE, 2=Enhanced RLE
    header.append(2)
    # reserved
    header.append(1)
    header += bytearray(21)
    return header

header_template = get_header()

def mergeimages(images):
    '''
    merge up to 24 binary images into a single 24-bit image, each pixel is an
    uint32 of format 0x00BBGGRR
    '''
    image32 = np.zeros((1080, 1920), dtype=np.uint32)
    n_img = len(images)
    batches = [8]*(n_img//8)
    if n_img % 8:
        batches.append(n_img % 8)
    for i, batch_size in enumerate(batches):
        image8 = np.zeros((1080, 1920), dtype=np.uint8)
        for j in range(batch_size):
            image8 += images[i*8+j]*(1 << j)
        image32 += image8*(1 << (i*8))
    return image32

def bgr(pixel):
    '''
    convert an uint32 pixel into [B, G, R] bytes
    '''
    return pack32be(pixel)[1:4]

def enc128(num):
    '''
    encode num (up to 32767) into 1 or 2 bytes
    '''
    return bytes([(num & 0x7f) | 0x80, num >> 7]) if num >= 128 else bytes([num])

def run_len(row, idx):
    '''
    find the length of the longest run starting from idx in row
    '''
    stride = 128
    length = len(row)
    j = idx
    while j < length and row[j]:
        if j % stride == 0 and np.all(row[j:j+stride]):
            j += min(stride, length-j)
        else:
            j += 1
    return j-idx

def encode_row(row, same_prev):
    '''
    encode a row of length 1920 with the format described in section 2.4.3.2
    '''
    # bool array indicating if same as previous row, shape = (1920, )
#     same_prev = np.zeros(1920, dtype=bool) if i==0 else image[i]==image[i-1]
    # bool array indicating if same as next element, shape = (1919, )
    same = np.logical_not(np.diff(row))
    # same as previous row or same as next element, shape = (1919, )
    same_either = np.logical_or(same_prev[:1919], same)

    j = 0
    compressed = bytearray(0)
    bytecount = 0
    while j < 1920:
        # copy n pixels from previous line
        if same_prev[j]:
            r = run_len(same_prev, j+1) + 1
            j += r
            compressed += b'\x00\x01' + enc128(r)
            bytecount += 2 + len(enc128(r))
        # repeat single pixel n times
        elif j < 1919 and same[j]:
            r = run_len(same, j+1) + 2
            j += r
            compressed += enc128(r) + bgr(row[j-1])
            bytecount += len(enc128(r)) + 3

        # single uncompressed pixel
        elif j > 1917 or same_either[j+1]:
            compressed += b'\x01' + bgr(row[j])
            bytecount += 4
            j += 1

        # multiple uncompressed pixels
        else:
            j_start = j
            pixels = bgr(row[j]) + bgr(row[j+1])
            bytecount += 6
            j += 2
            while j == 1919 or not same_either[j]:
                pixels += bgr(row[j])
                bytecount += 3
                j += 1
                if j == 1920:
                    break
            compressed += b'\x00' + enc128(j-j_start) + pixels
            bytecount += 1 + len(enc128(j-j_start))

    return compressed + b'\x00\x00', bytecount + 2

def encode(images):
    '''
    encode image with the format described in section 2.4.3.2.1
    '''
    # header
    encoded = header_template.copy()
    bytecount = 48
    # uint32 array, shape = (1080, 1920)
    image = mergeimages(images)

    # image content
    for i in range(1080):
        # bool array indicating if same as previous row, shape = (1920, )
        same_prev = np.zeros(1920, dtype=bool) if i == 0 else image[i] == image[i-1]
        row, bytecount_row = encode_row(image[i], same_prev)
        encoded += row
        bytecount += bytecount_row

    # end of image
    encoded += b'\x00\x01\x00'
    bytecount += 3
    # pad to 4-byte boundary
    encoded += bytearray((-len(encoded)) % 4)
    bytecount += (-len(encoded)) % 4

    # overwrite number of bytes in header
    # uint32 little endian, offset=8
    struct.pack_into('<I', encoded, 8, len(encoded))
    return encoded, bytecount

# a dmd controller class
class dmd():
    def __init__(self):
        self.dev = usb.core.find(idVendor=0x0451, idProduct=0xc900)
        i = self.dev[0].interfaces()[0].bInterfaceNumber
        if self.dev.is_kernel_driver_active(i):
            try:
                self.dev.detach_kernel_driver(i)
            except usb.core.USBError as e:
                sys.exit(f"Could not detatch kernel driver from interface({i}): {str(e)}")
        self.dev.set_configuration()

        self.ans = []

    # standard usb command function
    def command(self, mode, sequencebyte, com1, com2, data=None):
        buffer = []
        flagstring = ''
        if mode == 'r':
            flagstring += '1'
        else:
            flagstring += '0'
        flagstring += '1000000'
        buffer.append(bitstobytes(flagstring)[0])
        buffer.append(sequencebyte)
        temp = bitstobytes(convlen(len(data)+2, 16))
        buffer.append(temp[0])
        buffer.append(temp[1])
        buffer.append(com2)
        buffer.append(com1)

        if len(buffer)+len(data) < 65:
            for i in range(len(data)):
                buffer.append(data[i])
            for i in range(64-len(buffer)):
                buffer.append(0x00)
            self.dev.write(1, buffer)

        else:
            for i in range(64-len(buffer)):
                buffer.append(data[i])
            self.dev.write(1, buffer)
            buffer = []
            j = 0
            while j < len(data)-58:
                buffer.append(data[j+58])
                j += 1
                if j % 64 == 0:
                    self.dev.write(1, buffer)
                    buffer = []
            if j % 64 != 0:
                while j % 64 != 0:
                    buffer.append(0x00)
                    j += 1
                self.dev.write(1, buffer)
        self.ans = self.dev.read(0x81, 64)

    # functions for checking error reports in the dlp answer
    def checkforerrors(self):
        self.command('r', 0x22, 0x01, 0x00, [])
        if self.ans[6] != 0:
            print(self.ans[6])

    # function printing all of the dlp answer
    def readreply(self):
        for i in self.ans:
            print(hex(i))

    # functions for idle mode activation
    def idle_on(self):
        self.command('w', 0x00, 0x02, 0x01, [int('00000001', 2)])
        self.checkforerrors()

    def idle_off(self):
        self.command('w', 0x00, 0x02, 0x01, [int('00000000', 2)])
        self.checkforerrors()

    # functions for power management
    def standby(self):
        self.command('w', 0x00, 0x02, 0x00, [int('00000001', 2)])
        self.checkforerrors()

    def wakeup(self):
        self.command('w', 0x00, 0x02, 0x00, [int('00000000', 2)])
        self.checkforerrors()

    def reset(self):
        self.command('w', 0x00, 0x02, 0x00, [int('00000010', 2)])
        self.checkforerrors()

    # test write and read operations, as reported in the dlpc900 programmer's
    # guide
    def testread(self):
        self.command('r', 0xff, 0x11, 0x00, [])
        self.readreply()

    def testwrite(self):
        self.command('w', 0x22, 0x11, 0x00,
                     [0xff, 0x01, 0xff, 0x01, 0xff, 0x01])
        self.checkforerrors()

    # some self explaining functions
    def changemode(self, mode):
        self.command('w', 0x00, 0x1a, 0x1b, [mode])
        self.checkforerrors()

    def startsequence(self):
        self.command('w', 0x00, 0x1a, 0x24, [2])
        self.checkforerrors()

    def pausesequence(self):
        self.command('w', 0x00, 0x1a, 0x24, [1])
        self.checkforerrors()

    def stopsequence(self):
        self.command('w', 0x00, 0x1a, 0x24, [0])
        self.checkforerrors()

    def configurelut(self, imgnum, repeatnum):
        img = convlen(imgnum, 11)
        repeat = convlen(repeatnum, 32)
        string = repeat + '00000' + img
        bytes = bitstobytes(string)
        self.command('w', 0x00, 0x1a, 0x31, bytes)
        self.checkforerrors()

    def definepattern(self, index, exposure, bitdepth, color, triggerin,
                      darktime, triggerout, patind, bitpos):
        payload = []
        index = convlen(index, 16)
        index = bitstobytes(index)
        for i in range(len(index)):
            payload.append(index[i])
        exposure = convlen(exposure, 24)
        exposure = bitstobytes(exposure)
        for i in range(len(exposure)):
            payload.append(exposure[i])
        optionsbyte = ''
        optionsbyte += '1'
        bitdepth = convlen(bitdepth-1, 3)
        optionsbyte = bitdepth + optionsbyte
        optionsbyte = color + optionsbyte
        if triggerin:
            optionsbyte = '1' + optionsbyte
        else:
            optionsbyte = '0' + optionsbyte
        payload.append(bitstobytes(optionsbyte)[0])
        darktime = convlen(darktime, 24)
        darktime = bitstobytes(darktime)
        for i in range(len(darktime)):
            payload.append(darktime[i])
        triggerout = convlen(triggerout, 8)
        triggerout = bitstobytes(triggerout)
        payload.append(triggerout[0])
        patind = convlen(patind, 11)
        bitpos = convlen(bitpos, 5)
        lastbits = bitpos + patind
        lastbits = bitstobytes(lastbits)
        for i in range(len(lastbits)):
            payload.append(lastbits[i])
        self.command('w', 0x00, 0x1a, 0x34, payload)
        self.checkforerrors()

    def setbmp(self, index, size):
        payload = []
        index = convlen(index, 5)
        index = '0'*11+index
        index = bitstobytes(index)
        for i in range(len(index)):
            payload.append(index[i])
        total = convlen(size, 32)
        total = bitstobytes(total)
        for i in range(len(total)):
            payload.append(total[i])
        self.command('w', 0x00, 0x1a, 0x2a, payload)
        self.checkforerrors()

    # bmp loading function, divided in 56 bytes packages
    # max  hid package size=64, flag bytes=4, usb command bytes=2
    # size of package description bytes=2. 64-4-2-2=56
    def bmpload(self, image, size):
        packnum = size//504+1
        counter = 0
        for i in range(packnum):
            if i % 100 == 0:
                print(i, packnum)
            payload = []
            if i < packnum-1:
                leng = convlen(504, 16)
                bits = 504
            else:
                leng = convlen(size % 504, 16)
                bits = size % 504
            leng = bitstobytes(leng)
            for j in range(2):
                payload.append(leng[j])
            for j in range(bits):
                payload.append(image[counter])
                counter += 1
            self.command('w', 0x11, 0x1a, 0x2b, payload)
            self.checkforerrors()

    def defsequence(self, images, exp, ti, dt, to, rep):
        self.stopsequence()
        arr = []
        for i in images:
            arr.append(i)
        num = len(arr)
        encodedimages = []
        sizes = []
        for i in range((num-1)//24+1):
            print('encoding...')
            if i < ((num-1)//24):
                imagedata, size = encode(arr[i*24:(i+1)*24])
            else:
                imagedata, size = encode(arr[i*24:])
            encodedimages.append(imagedata)
            sizes.append(size)
            if i < ((num-1)//24):
                for j in range(i*24, (i+1)*24):
                    self.definepattern(j, exp[j], 1, '111', ti[j], dt[j],
                                       to[j], i, j-i*24)
            else:
                for j in range(i*24, num):
                    self.definepattern(j, exp[j], 1, '111', ti[j], dt[j],
                                       to[j], i, j-i*24)
        self.configurelut(num, rep)
        for i in range((num-1)//24+1):
            self.setbmp((num-1)//24-i, sizes[(num-1)//24-i])
            print('uploading...')
            self.bmpload(encodedimages[(num-1)//24-i], sizes[(num-1)//24-i])

e841018 commented 2 years ago

@taladjidi

Would it be possible to include it in the main branch

Do you mean the main branch of this repo (csi-dcsc/Pycrafter6500)? (asking since it's the first time to see your name here)

I didn't create a PR since my own version of Pycrafter6500 has major changes and no longer conforms the original API. I was planning to release a full replacement of the whole library, but I didn't have time to debug some hardware-related issues. I think I'll give up releasing the full replacement and just make a PR to this repo.

By the way, I think it will be much easier to maintain the codebase if you import the function from a new module.

taladjidi commented 2 years ago

@taladjidi

Would it be possible to include it in the main branch

Do you mean the main branch of this repo (csi-dcsc/Pycrafter6500)? (asking since it's the first time to see your name here)

Yes I meant this repo. I indeed discovered this repo yesterday :)

I didn't create a PR since my own version of Pycrafter6500 has major changes and no longer conforms the original API. I was planning to release a full replacement of the whole library, but I didn't have time to debug some hardware-related issues. I think I'll give up releasing the full replacement and just make a PR to this repo.

By the way, I think it will be much easier to maintain the codebase if you import the function from a new module.

Thanks for the advice, I guess for my application, I just need a light standalone library so I just tried to copy paste how it was before, but sure in the long run it would be better. In any case thank you very much as it allowed me to understand very fast how these TI DMD's work.

ppozzi commented 2 years ago

Hello all, sorry i disappeared after promising a lifetime ago i would update the code. Of course i created a tidy little project in Pycharm and proceeded to forget about it. Anyway, as @e841018 said, his code has deviated a bit from my own base, and importing the function from his module with some modification to my code would be the best practice in this situation. As i said previously, I no longer have a physical evaluation module to test the code, so just making a change to the master branch of this repo would be a bit dangerous. I do have two options:

1) Wait for @e841018 to send a PR, trust him and accept it. 2) Make a branch with the new encoder, and ask you guys for feedback before committing the changes to the master.

I guess it's up to @e841018 to let me know what he prefers. Cheers

e841018 commented 2 years ago

I would prefer 1. I made sure the erle.py module is compatible with Python 2.7, and have just sent a PR.

Thanks for reviewing the changes in advance!

ppozzi commented 2 years ago

Done, i have accepted the pull request, and added an acknowledgement in the readme. I will close the issue for now, but please, let me know if anything breaks with the library, and i'll re-open it while applying changes. Thanks Ashu!

csi-dcsc / Pycrafter6500

Any suggestions for speeding up the pattern load onto DMD #2